Hi All,
I am writing this blog post to share some important Apache Spark framework for starters topics and foundation understanding points..
I am writing this blog post to share some important Apache Spark framework for starters topics and foundation understanding points..
Apache Spark is an open-source parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications.
Spark processing engine is built for speed, ease of use, and sophisticated analytics. Spark's in-memory computation capabilities make it a good choice for iterative algorithms in machine learning and graph computations.
You can write applications in Python, Scala and R in Spark clusters. HDInsight contains out of the box notebooks (tools/Dev IDEs) that allows data scientist to write programs in Spark using:
1) Python language using Jupyter , It also supports R, Julia, and Scala
2) Scala language using Zepplein
Most important libraries for Apache Spark:
1) Spark Sql: A module for structured data manipulation using SQL or DataFrame API.
2) Spark Streaming: A module for building stream processing apps the same way your write batch jobs. It supports Java, Scala and Python.
3) MLLib: A library for machine learning
Some useful links and resources for Apache Spark:
1) Apache Spark homepage:
2) Learning Python:
3) Learning Scala:
HDInsight Apache Spark provides tons of tools out of the box, check out this link to see why would use HDInsight Apache Spark in Azure:
Hope this helps! Enjoy crushing streams of data...
No comments:
Post a Comment