Apache Spark 2.x – Data processing - Getting Started - Spark Modules

Spark Modules

In the earlier version of Spark,we have core API at the bottom and all the higher level modules work with core API.Examples of core API are a map,reduce,join,groupByKey etc.But with Spark 2,Data frames and Spark SQL has become the core module.

  • Core - Transformations and Actions -APIs such as map,reduce,join,filter etc.They typically work on RDD
  • Spark SQL and Data Frames -APIs and Spark SQL interface for batch processing on top of Data Frames or Data Sets(not available for Python)
  • Structured Streaming - APIs and Spark SQL interface for stream data processing on top of Data Frames
  • Machine Learning Pipelines - Machine Learning data pipelines to apply Machine Learning algorithms on top of Data Frames
  • GraphX Pipelines
  • We can build applications using different programming languages such as Scala,Python,Java,R etc leveraging Spark APIs of the above-mentioned modules.

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster