Joining disparate data sets using pyspark

Originally published at: http://www.itversity.com/topic/joining-disparate-data-sets-using-pyspark/

Introduction to joining data sets As part of this topic we will cover the highlighted certification topic Load data from HDFS and store results back to HDFS using Spark Join disparate datasets together using Spark Calculate aggregate statistics (e.g., average or sum) using Spark Filter data into a smaller dataset using Spark Write a query…