Mostafa's Blog: April 2017

Tuesday, April 25, 2017

Linear Regression Algorthims in Scikit-Learn

Hi,

While i am working on different regression algorithms in scikit-learn library. I would like to share some important tips to differentiate between major linear regression algorithms in Machine Learning space.

Below is a comparison table to compare among four linear regression algorithms:

The general idea of Gradient Descent (GD) is to tweak parameters iteratively in order to minimize a cost function.

Batch and Stochastic Gradient Descent: at each step, both algorithms compute the gradients based on the full training dataset (as in Batch GD) or based on just one instance (as in Stochastic GD).

While in Mini-Batch Gradient Descent algorithm: computes the gradients based on small random sets of instances called mini batches.

There are more linear regression algorithms in sklearn that is not covered in this blog post, you can find it here: http://scikit-learn.org/stable/modules/sgd.html#regression

Hope this helps!

Sunday, April 23, 2017

What is the difference between estimators vs transformers vs predictors in sklearn?

Hi All,

While working in Machine Learning projects using scikit-learn library, I would like to highlight important and fundamental concepts that every ML ninja needs to be aware of. In this post i am highlighting few concepts to differentiate estimators vs transformers vs predictors in building machine learning solutions using sklearn.

1) Estimators: Any objects that can estimate some parameters based on a dataset is called an estimator. The estimation itself is performed by calling fit() method.
This method takes one parameter (or two in case of supervised learning algorithms). Any other parameter needed to guide the estimation process is called hyperparameter and must be set as in instance variable.

For example: i would like to estimate a mean, median or most frequent value of a column in my dataset.

This is a cheat sheet of sklearn estimators. you can find the up to date version here.

2) Transformers: Transform a dataset. It transforms a dataset by calling transform() method and it returns a transformed dataset. some estimators can also transform a dataset.

For example: Imputer class in sklearn is an estimator and a transformer. You can call fit_transform() method that estimate and transform a dataset.

Python code:

from sklearn.preprocessing inport Imputer

imputer = Imputer(strategy="mean") #estimate mean value for dataset columns

imputer.fit(mydataset) # Imputer as an estimator

imputer.fit_transform(mydataset) # Imputer as a transformer and estimator (Combined two steps)

3) Predictors: making predictions for given a dataset. A predictor class has predict() method that takes a new instances of a dataset and returns a dataset with corresponding predictions. Also, it contains score() method that measures the quality of the predictions for a giving test dataset.

For example: LinearRegression, SVM, Decision Tree,..etc are predictors.

You can combine building blocks of estimators, transformers and predictors as a pipeline in sklearn. This allows developers to use multiple estimators from a sequence of transformers followed by a final estimator or predictor. This concept is called composition in Machine Learning.

Hope this helps

Friday, April 07, 2017

How to configure X2GO Client on Data Science Virtual Machine

Hi,

While i was trying to connect to a newly provisioned data science virtual machine in Azure. I have received few challenges on successfully start a session in X2GO client app.

The Data Science Virtual Machine (DSVM) VM image makes it easy to get started doing data science in minutes, without having to install and configure each of the tools individually.
This virtual machine contains: Cent OS, Microsoft R Developer edition, Anaconda python distribution, Standalone spark, CNTK, Rattle, XGBoost, in addition to other tools. Check out this article for the full details of this VM.

X2GO client provides a client tool for windows users to RDP to linux VMs, you can install this tool from here.

After you install this tool and try to connect to the DSVM VM, you will get this error:

unable to start startkde

To solve this problem, follow these steps:

1) Connect to the VM using any client linux tool to such as Putty.
2) After you login to the VM, execute the following command:

sudo yum install @kde

This will install and upgrade existing packages on the VM. The VM will prompt you to accept installing all required packages and upgrades.

3) This command will take few seconds to complete. Below screenshot upon completion step is finished.

4) Return back to X2GO client and login using your username and password.

5) You will be able to successfully RDP to DSVM machine.

Hope this helps!

Wednesday, April 05, 2017

How to install Keras on Windows 10 with 64 bit

Hi,

I was trying to install Keras library on Windows 10 with 64 bit build machine. Since i use Anaconda to manage python packages on my machine, The first thing i tried was to install the package from the Anaconda command line by executing the following command:

conda install keras

I got the following error:

PackageNotFoundError: Package missing in current win-64 bit channels:
- keras

To fix this issue, Follow these steps:

1) Check the latest Keras package from Anaconda website by visiting this link:
https://anaconda.org/search?q=keras

2) Select Keras library from the list, then copy the displayed command from the website:

conda install -c conda-forge keras=2.0.2

3) Run this command in the Anaconda command prompt window.

4) Keras library is installed and you will be able to start deep learning with Keras!

Enjoy!