In the earlier version of Spark,we have core API at the bottom and all the higher level modules work with core API.Examples of core API are a map,reduce,join,groupByKey etc.But with Spark 2,Data frames and Spark SQL has become the core module.
- Core - Transformations and Actions -APIs such as map,reduce,join,filter etc.They typically work on RDD
- Spark SQL and Data Frames -APIs and Spark SQL interface for batch processing on top of Data Frames or Data Sets(not available for Python)
- Structured Streaming - APIs and Spark SQL interface for stream data processing on top of Data Frames
- Machine Learning Pipelines - Machine Learning data pipelines to apply Machine Learning algorithms on top of Data Frames
- GraphX Pipelines
- We can build applications using different programming languages such as Scala,Python,Java,R etc leveraging Spark APIs of the above-mentioned modules.
Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs
- Click here for access to state of the art 13 node Hadoop and Spark Cluster