HDPCD - Joining data sets


Originally published at: http://www.itversity.com/topic/hdpcd-joining-data-sets-scala/

Introduction As per this topic we will see different joins Spark support All types of joins work on paired RDDs Joins require 2 paired RDDs join leftOuterJoin rightOuterJoin fullOuterJoin Output will be paired RDD with key and value Value is nested tuple with values from both the data sets Understand Concept behind joins Joining orders…