Tuesday, October 31, 2017

Load datasets from azure blob storage into Pandas dataframe

Hi,

In this post, I am sharing how to work and load data sets that are stored in Azure blob storage into Pandas data frame.

I have the full code posted in Azure notebooks. This code snippet is useful to use in any Jupyter notebook while working on your data pipeline while developing Machine Learning models.

I have exported a data set into a csv file and stored it into an Azure blob storage so i can use it into my notebooks.



Python code snippet:

import pandas as pd
import time
# import azure sdk packages
from azure.storage.blob import BlobService

def readBlobIntoDF(storageAccountName, storageAccountKey, containerName, blobName, localFileName):    
    # get an instance of blob service 
    blob_service = BlobService(account_name=storageAccountName, account_key= storageAccountKey)
    # save file content into local file name
    blob_service.get_blob_to_path(CONTAINERNAME,blobName,localFileName)
    # load local csv file into a dataframe    
    dataframe_blobdata = pd.read_csv(localFileName, header=0)
    
    return dataframe_blobdata

STORAGEACCOUNTNAME= 'STORAGE_ACCOUNT_NAME'
STORAGEACCOUNTKEY= 'STORAGE_KEY'    
CONTAINERNAME= 'CONTAINER_NAME'
BLOBNAME= 'BLOB_NAME.csv'
LOCALFILENAME = 'FILE_NAME-csv-local'

# load blob file into pandas dataframe
tmp = readBlobIntoDF(STORAGEACCOUNTNAME,STORAGEACCOUNTKEY,CONTAINERNAME,BLOBNAME, LOCALFILENAME)
tmp.head()


The full code snippet is posted in Azure Notebook here.

Enjoy!