Hello Everyone… I still see a lot of people getting confused on this so I’ll try to explain using a simple example. Consider 2 Spark RDDs to be joined together…
Say, rdd1.first is in the form of (Int, Int, Float) = (1,957,299.98) while rdd2.first is something like (Int, Int) = (25876,1) where the join is supposed to take place on the 1st field from both the RDDs.
scala> rdd1.join(rdd2) — results in an error :**: error: value join is not a member of org.apache.spark.rdd.RDD[(Int, Int, Float)]
Both the RDDs should be in the form of a Key-Value pair.
Here, rdd2 – being in the form of (1,957,299.98) – does not obey this rule… While rdd1 – which is in the form of (25876,1) – does.
Convert the output of the 1st RDD from (1,957,299.98) to a Key-Value pair in the form of (1,(957,299.98)) before joining it with rdd2, as shown below:
scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) – modified RDD
res**: (Int, (Int, Float)) = (1,(957,299.98))
val joinedRDD = rdd1KV.join(rdd2) – join successful
joinedRDD: org.apache.spark.rdd.RDD[(Int, ((Int, Float), Int))] = MapPartitionsRDD at join …
By the way, join is the member of org.apache.spark.rdd.PairRDDFunctions. So make sure you import this on your Eclipse or IDE, wherever you want to run your code.
Article also on my blog @ https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html