Wednesday, December 09, 2015

Get Started with Apache Storm Resources in HDInsight

Hi All,

I'd like to share some useful resource to get started with Apache Storm in this blog post.
Apache storm is a distributed real-time computational system that allows engineers to process streams of data at scale.

Apache storm is one of the major hadoop ecosystem components, where engineers use it to process the sources of data into hadoop ecosystem.

Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

Imagine you need to process endless source of data (such as: Facebook news feed or Twitter feed) and you are going to process this large volume of  information and then store in Hadoop. In this case, you want to build a storm application specifying the topology by defining the sources of information (Spouts)  and how to process this chunk of data (Bolts).

Every Storm application contains a topology, Set of spouts and bolts in addition to a specification file for the topology.

I compiled some useful resources to get started and work with Apache storm:

1) Apache Storm main website:
http://storm.apache.org/index.html 

2) HDInsight Hadoop documentation in Azure:
https://azure.microsoft.com/en-us/documentation/services/hdinsight/ 

3) SCP.NET, Get started with building .NET apps in C# in Storm:
https://github.com/hdinsight/hdinsight-storm-examples/blob/master/SCPNet-GettingStarted.md

4) Power of Storm with examples:
https://github.com/hdinsight/hdinsight-storm-examples/blob/master/README.md

5) EDX Free online course (Implementing Real-Time Analytics with Hadoop in Azure HDInsight) :
https://www.edx.org/course/implementing-real-time-analytics-hadoop-microsoft-dat202-2x

.
Hope this helps.

No comments: