Mostafa's Blog: 2017

Tuesday, November 14, 2017

How to convert epoch time to datetime in Pandas

Hi,

While i am working with iot data to transform UNIX epoch time in seconds. I would like to convert epoch time in seconds into human readable date time and not a reference date which is based upon 1970.

I have my data in pandas dataframe, below screen shot shows "createdTime" column in epoch time in seconds:

Here is the code segment that convert UNIX epoch time into date time:

convert = lambda x: datetime.datetime.fromtimestamp(x / 1e3)
ds['ts'] = ds['createdTime'].apply(convert)
ds.head()

This code generates the expected output, below screen shot shows the output:

Hope this helps!

Enable Jupyter notebook in Anaconda Navigator

Hi,

After i installed the latest conda runtime (anaconda 3 x64 distro) that uses Anaconda 3 on Windows 64 bit.

When i try to click on a target environment, i see that "Open with IPython" or "Open with Jupyter Notebook"

The question is how to enable this? I found how to install Jupyter notebook package for conda environment where it would be accessible through the Navigator tool.

Follow below steps:

1) Select any of the available environment, Click on "Open terminal" window.
2) Type the following command:

conda install nb_conda

3) This will install notebook packages for Jupyter and once it is completed, The Jupyter notebook will be available for all environments.

4) To testify this work properly, Click on Jupyter link and this will open up the notebook.

5) Write some code to make sure this works with no issues.

import pandas as pd
s = pd.Series([1508258340299])
pd.to_datetime(s)

The code executed with no issues:

Hope this helps!

Thursday, November 09, 2017

Error downloading files from secure sites in .NET apps 4.6.2

Hi All,

I was working on upgrading a .Net application runtime from version 4.5 to latest one 4.6.2. After i did that, the application threw an error in the step of downloading a zip file from a secure site.

The app throws thew following error:

{"The request was aborted: Could not create SSL/TLS secure channel."} when trying to download file

The code snippet that was throwing the error in the DownloadFile method in .NET WebClient class:

C# code:

using (WebClient wc = new WebClient())

{
wc.DownloadFile(ssl_url, downloadedFilePath);
}

The destination is a secure site uses SSL, after searching and trying different options, I found out the solution by enabling the TLS 1.1 and TLS 2.2 before calling download file method.

Here is the modified code snippet:

System.Net.ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls11| SecurityProtocolType.Tls12;
using (WebClient wc = new WebClient())
{
wc.DownloadFile(nhtsa_url, downloadedFilePath);
}

Hope this helps!

Tuesday, October 31, 2017

Load datasets from azure blob storage into Pandas dataframe

Hi,

In this post, I am sharing how to work and load data sets that are stored in Azure blob storage into Pandas data frame.

I have the full code posted in Azure notebooks. This code snippet is useful to use in any Jupyter notebook while working on your data pipeline while developing Machine Learning models.

I have exported a data set into a csv file and stored it into an Azure blob storage so i can use it into my notebooks.

Python code snippet:

import pandas as pd
import time
# import azure sdk packages
from azure.storage.blob import BlobService

def readBlobIntoDF(storageAccountName, storageAccountKey, containerName, blobName, localFileName):    
    # get an instance of blob service 
    blob_service = BlobService(account_name=storageAccountName, account_key= storageAccountKey)
    # save file content into local file name
    blob_service.get_blob_to_path(CONTAINERNAME,blobName,localFileName)
    # load local csv file into a dataframe    
    dataframe_blobdata = pd.read_csv(localFileName, header=0)
    
    return dataframe_blobdata

STORAGEACCOUNTNAME= 'STORAGE_ACCOUNT_NAME'
STORAGEACCOUNTKEY= 'STORAGE_KEY'    
CONTAINERNAME= 'CONTAINER_NAME'
BLOBNAME= 'BLOB_NAME.csv'
LOCALFILENAME = 'FILE_NAME-csv-local'

# load blob file into pandas dataframe
tmp = readBlobIntoDF(STORAGEACCOUNTNAME,STORAGEACCOUNTKEY,CONTAINERNAME,BLOBNAME, LOCALFILENAME)
tmp.head()

The full code snippet is posted in Azure Notebook here.

Enjoy!

Tuesday, June 06, 2017

Setup Remote Desktop for Raspberry Pi with no need for an external display

Hi,

If you are thinking about how to setup remote desktop to raspberry pi, this article is for you. I will show a walk-through to install required packages so that you are able to remote desktop from your windows machine or any other remote machine.

Steps to configure remote desktop on raspberry pi:

1) Connect to your Raspberry Pi using Putty.
2) Open a terminal window.
3) We are going to install XRDP package to configure RDP on the Pi. Before installing xrdp, we must first install the tightvncserver package. The tightvncserver installation will also remove the RealVNC server software that ships with newer versions of Raspbian OS since tightvncserver (xrdp) will not work if RealVNC is installed.

$ sudo apt install -y tightvncserver
$ sudo apt install -y xrdp

3) Now, Just install Samba package that provides a GUI when accessing a Pi using RDP.

$ sudo apt install -y samba

4) Open up the remote desktop tool in windows or your host OS and set the name or IP of your Pi and hit connect.

With that, we can connect to any remote Pi or Linux based IoT device from your computer; therefore no need to connect an IoT device to an external screen.

[Update 03/16/2018]

If you have a newer version of Raspbia Jessie or above, You can remote desktop without the need to install XRDP since Raspbian now has built in "VNC Server" which we can use to remote desktop to any Raspberry PI. To enable remote desktop, follow these steps:

1) From the terminal window, execute the following:

sudo rasp-config

2) This will open up a graphical interface, select "Interfacing Options".
3) Then select "VNC Server" to enable VNC server.
4) This will install all VNC components on the PI, after about 30 seconds, you will be prompted that the VNC server is enable.
5) Hit enter to hit Ok button.
6) Scroll down to Finish and hit enter.
7) Reboot the PI.
8) Download VNC Client on your machine (Windows, Mac or Linux) from here.
9) Open VNC client viewer, type the hostname or the ip of your pi.
10) You are in!

Enjoy!

Tuesday, April 25, 2017

Linear Regression Algorthims in Scikit-Learn

Hi,

While i am working on different regression algorithms in scikit-learn library. I would like to share some important tips to differentiate between major linear regression algorithms in Machine Learning space.

Below is a comparison table to compare among four linear regression algorithms:

The general idea of Gradient Descent (GD) is to tweak parameters iteratively in order to minimize a cost function.

Batch and Stochastic Gradient Descent: at each step, both algorithms compute the gradients based on the full training dataset (as in Batch GD) or based on just one instance (as in Stochastic GD).

While in Mini-Batch Gradient Descent algorithm: computes the gradients based on small random sets of instances called mini batches.

There are more linear regression algorithms in sklearn that is not covered in this blog post, you can find it here: http://scikit-learn.org/stable/modules/sgd.html#regression

Hope this helps!

Sunday, April 23, 2017

What is the difference between estimators vs transformers vs predictors in sklearn?

Hi All,

While working in Machine Learning projects using scikit-learn library, I would like to highlight important and fundamental concepts that every ML ninja needs to be aware of. In this post i am highlighting few concepts to differentiate estimators vs transformers vs predictors in building machine learning solutions using sklearn.

1) Estimators: Any objects that can estimate some parameters based on a dataset is called an estimator. The estimation itself is performed by calling fit() method.
This method takes one parameter (or two in case of supervised learning algorithms). Any other parameter needed to guide the estimation process is called hyperparameter and must be set as in instance variable.

For example: i would like to estimate a mean, median or most frequent value of a column in my dataset.

This is a cheat sheet of sklearn estimators. you can find the up to date version here.

2) Transformers: Transform a dataset. It transforms a dataset by calling transform() method and it returns a transformed dataset. some estimators can also transform a dataset.

For example: Imputer class in sklearn is an estimator and a transformer. You can call fit_transform() method that estimate and transform a dataset.

Python code:

from sklearn.preprocessing inport Imputer

imputer = Imputer(strategy="mean") #estimate mean value for dataset columns

imputer.fit(mydataset) # Imputer as an estimator

imputer.fit_transform(mydataset) # Imputer as a transformer and estimator (Combined two steps)

3) Predictors: making predictions for given a dataset. A predictor class has predict() method that takes a new instances of a dataset and returns a dataset with corresponding predictions. Also, it contains score() method that measures the quality of the predictions for a giving test dataset.

For example: LinearRegression, SVM, Decision Tree,..etc are predictors.

You can combine building blocks of estimators, transformers and predictors as a pipeline in sklearn. This allows developers to use multiple estimators from a sequence of transformers followed by a final estimator or predictor. This concept is called composition in Machine Learning.

Hope this helps

Friday, April 07, 2017

How to configure X2GO Client on Data Science Virtual Machine

Hi,

While i was trying to connect to a newly provisioned data science virtual machine in Azure. I have received few challenges on successfully start a session in X2GO client app.

The Data Science Virtual Machine (DSVM) VM image makes it easy to get started doing data science in minutes, without having to install and configure each of the tools individually.
This virtual machine contains: Cent OS, Microsoft R Developer edition, Anaconda python distribution, Standalone spark, CNTK, Rattle, XGBoost, in addition to other tools. Check out this article for the full details of this VM.

X2GO client provides a client tool for windows users to RDP to linux VMs, you can install this tool from here.

After you install this tool and try to connect to the DSVM VM, you will get this error:

unable to start startkde

To solve this problem, follow these steps:

1) Connect to the VM using any client linux tool to such as Putty.
2) After you login to the VM, execute the following command:

sudo yum install @kde

This will install and upgrade existing packages on the VM. The VM will prompt you to accept installing all required packages and upgrades.

3) This command will take few seconds to complete. Below screenshot upon completion step is finished.

4) Return back to X2GO client and login using your username and password.

5) You will be able to successfully RDP to DSVM machine.

Hope this helps!

Wednesday, April 05, 2017

How to install Keras on Windows 10 with 64 bit

Hi,

I was trying to install Keras library on Windows 10 with 64 bit build machine. Since i use Anaconda to manage python packages on my machine, The first thing i tried was to install the package from the Anaconda command line by executing the following command:

conda install keras

I got the following error:

PackageNotFoundError: Package missing in current win-64 bit channels:
- keras

To fix this issue, Follow these steps:

1) Check the latest Keras package from Anaconda website by visiting this link:
https://anaconda.org/search?q=keras

2) Select Keras library from the list, then copy the displayed command from the website:

conda install -c conda-forge keras=2.0.2

3) Run this command in the Anaconda command prompt window.

4) Keras library is installed and you will be able to start deep learning with Keras!

Enjoy!

Monday, March 27, 2017

How to install and run Jupyter from your local computer for python development

Hi,

If you are planning to program in Python from your local computer, the best development environment to code, instruct and visualize data is using Jupyter notebook.

I really like working with Jupyter notebook (aka IPython Notebook) for coding in Python, R programs.

As a lot of us download and look at ipynb files to use it in our applications. Instead of copy and paste code into Python console window, Jupyter notebook provides more interactive way to write code in Python and tons of other languages.

If you got a punch of ipynb files and would like to install and start working with Jupyter, follow these below steps:

1) Open command prompt window, write below command:

pip install jupyter notebook

2) After this installation is complete, navigate to the folder where you have set of ipynb files.

3) Run Jupyter notebook by executing the following command in the ipynb files folder:

jupyter notebook

4) A new browser window will open where it has jupyter files to start viewing or creating new ipynb files. Jupyter usually run on port 8888. The url for jupyter notebook looks like: http://localhost:8888/

Enjoy!

Wednesday, March 01, 2017

How to set storage account connection string in Azure Functions

Hi,

I was developing an Azure Function App that connects to an Azure blob storage. After setting up the binding for my blob storage account. I got the following error message when running my function app:

The error message in the screen shot above suggests three options to fix this. I will walk through how to implement the first option as one of the available solutions. The first solution is to set the connection string name in the appsettings.json file so it will look like this.

appsettings.json
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "",
"AzureWebJobsDashboard": "",
"AzureWebJobsmofunctions_STORAGE": "DefaultEndpointsProtocol=https;AccountName=mofunctions;AccountKey=KEY"
}
}

function.json (Just a section where i define my blob binding info)

{
"type": "blob",
"name": "iBlob",
"path": "mydata/file1.csv",
"connection": "mofunctions_STORAGE",
"direction": "in"
}

NOTE:
You will notice that the connection name value in function.json is a suffix for AzureWebJobs key in the appsettings.json file.

Once you set this, Press F5 and you will be able to connect and read blob contents from Azure storage accounts.

Enjoy!

Thursday, January 26, 2017

Mashing RDDs in Apache Spark from RDBMs perspective

Hi,

Happy new year! This is my first post in 2017!. 2016 was amazing year for me. lots of work, projects and achievements. Looking forward to 2017.

I am writing this blog post to cover the standard techniques to work with Resilient Distributed Datasets (RDDs) to join data in Apache Spark.

I would like to share some insights when working with RDDs in Spark. That's related to how to work with multiple RDDs as we do when working with relational database management systems.

Apache Spark support joins in RDDs, where you can implement all kinds of joins that we are aware of in RDBMS. Below i will list how would you implement this on this platform.

Apache Spark Join Transformations Operations:

1) join: This is equivalent to inner join in RDBMs. It returns a new pair RDD with the elements containing all possible pairs of values from the first and second RDDs that has the same keys. For the keys that exist in only one of the the two RDDs. the resulting RDD will have no elements.

2) leftOuterJoin: This is equivalent to left outer join in RDBMs. The resulting RDD will also contain the elements for those keys that don't exist in the second RDD.

3) rightOuterJoin: This is equivalent to right outer join in RDBMs. The resulting RDD will also contain the elements for those keys that don't exist in the first RDD.

4) fullOuterJoin: This is equivalent to cross join in RDBMs. The resulting RDD will also contain the elements for both keys that exist in either RDDs.

In case of the RDDs contain duplicate keys, these keys will be joined multiple times.

Hope this helps!