Move data between HDFS and Spark - pyspark

Originally published at: http://www.itversity.com/topic/move-data-between-hdfs-and-spark-pyspark/

Introduction As part of this topic we will cover the highlighted certification topic Load data from HDFS and store results back to HDFS using Spark Join disparate datasets together using Spark Calculate aggregate statistics (e.g., average or sum) using Spark Filter data into a smaller dataset using Spark Write a query that produces ranked or…