Need Help! Getting Exception for Joins: "value join is not a member of org.apache.spark.rdd.RDD"


#1

Hello Everyone,

I’m following this video link to work on the Joins in Spark! But for some reason I’m keep getting this below exception:

scala> val ordersJoin = orders.join(orderItems)
:30: error: value join is not a member of org.apache.spark.rdd.RDD[(Array[String], Int, String)]
val ordersJoin = orders.join(orderItems)
^

Video Link using Joins:

Have you seen this error and did you resolved it? If so, please share your input under this topic. Any help is much appreciated!

Thanks!
Venkat


#2

could you please copy paste your complete code?


#3

Please check if both the RDD’s are in (K,V) and (K,W) format so you get the results as (K,(V,W))

From the error it looks like orders RDD is in the format(Array,Int,String) which is not acceptable by Join method.

value join is not a member of org.apache.spark.rdd.RDD[(Array[String], Int, String)]

Thanks
Nitesh


#4

Hello Everyone… I still see a lot of people getting confused on this so I’ll try to explain using a simple example. Consider 2 Spark RDDs to be joined together…

Say, rdd1.first is in the form of (Int, Int, Float) = (1,957,299.98) while rdd2.first is something like (Int, Int) = (25876,1) where the join is supposed to take place on the 1st field from both the RDDs.

scala> rdd1.join(rdd2) — results in an error :**: error: value join is not a member of org.apache.spark.rdd.RDD[(Int, Int, Float)]

REASON

Both the RDDs should be in the form of a Key-Value pair.

Here, rdd2 – being in the form of (1,957,299.98) – does not obey this rule… While rdd1 – which is in the form of (25876,1) – does.

RESOLUTION

Convert the output of the 1st RDD from (1,957,299.98) to a Key-Value pair in the form of (1,(957,299.98)) before joining it with rdd2, as shown below:

scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) – modified RDD

scala> rdd1KV.first
res**: (Int, (Int, Float)) = (1,(957,299.98))

val joinedRDD = rdd1KV.join(rdd2) – join successful
joinedRDD: org.apache.spark.rdd.RDD[(Int, ((Int, Float), Int))] = MapPartitionsRDD[67] at join …

By the way, join is the member of org.apache.spark.rdd.PairRDDFunctions. So make sure you import this on your Eclipse or IDE, wherever you want to run your code.

Article also on my blog @ https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html

Thanks,
Vishal.