Apache Spark 2.x – Data processing - Getting Started - Apache Spark - Framework

Spark Framework

Let us understand the execution modes as well as different components of the Spark Framework. Also, we will recap some important aspects of YARN.

Execution Modes

Following are the differnt execution modes supported by Spark.

  • Local(for development)
  • Standalone(for development)
  • Mesos
  • YARN

As our clusrer uses YARN,let us recap some important aspects of YARN.

  • YARN uses Master(Resourse Manager) and Slave(Node Managers) Architecture.
  • YARN primarily takes care of resource management and scheduling the tasks.
  • FOr earn YARN Application,there will be an application master and set of containers created to process the data.
  • We can plugin different distributed frameworks into YARN,such as Map Reduce, Spark etc.
  • Spark creates executors to process the data and these executors will be managed by Resource Manager & per job Application Master.

Execution Framework

Let us understand the Spark exection by running wordcount program using RDD.

  • Driver Program
  • Spark Context
  • Executors
  • Executor Cache
  • Exector Tasks
  • job
  • Stage
  • Task(Executor Tasks)

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster