Filtering data using pyspark

Originally published at: http://www.itversity.com/topic/filtering-data-using-pyspark/

Introduction to filtering data sets As part of this topic, filtering of data sets using pyspark is covered. Load data from HDFS and store results back to HDFS using Spark Join disparate datasets together using Spark Calculate aggregate statistics (e.g., average or sum) using Spark Filter data into a smaller dataset using Spark Write a…