Apache Spark 2.x – Data processing - Getting Started - Develop Simple Application

Simple Application

Let us start with a simple application to understand details related to architecture using pyspark.

  • As we have multiple versions of Spark on our lab and we are exploring Spark 2 we need to export SPARK_MAJOR_VERSION Spark 2
  • For this demo, we will disable dynamic allocation by setting spark.dynamicAllocation.enable to false.
  • Launch pyspark using YARN and disabling dynamic allocation( also,use spark.ui.port as well to specify unique port).
  • Develop a simple word count program by reading data from /public/randomtextwriter/part-m-00000
  • Save output to /user/training

Using this let us go through the Spark Framework.

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster