Let us start with a simple application to understand details related to architecture using pyspark.
- As we have multiple versions of Spark on our lab and we are exploring Spark 2 we need to export SPARK_MAJOR_VERSION Spark 2
- For this demo, we will disable dynamic allocation by setting spark.dynamicAllocation.enable to false.
- Launch pyspark using YARN and disabling dynamic allocation( also,use spark.ui.port as well to specify unique port).
- Develop a simple word count program by reading data from /public/randomtextwriter/part-m-00000
- Save output to /user/training
Using this let us go through the Spark Framework.
Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs
- Click here for access to state of the art 13 node Hadoop and Spark Cluster