Multiple columns as value in paired RDD


#1

Can we have multiple column as value when creating (K,V) pair?

example-

val orderPairedRDD2=orders.map(orders=>{
val o=orders.split(",")
(o(0).toInt,o(1),o(2))
})

val orderitemsPairedRDD=orderItems.map(oi=>{
(oi.split(",")(1).toInt,oi)
})

when I join it throws error-
val x=orderitemsPairedRDD.join(orderPairedRDD2)

:35: error: type mismatch;
found : org.apache.spark.rdd.RDD[(Int, String, String)]
required: org.apache.spark.rdd.RDD[(Int, ?)]
val x=orderitemsPairedRDD.join(orderPairedRDD2)

where as when I take value as one column or all columns it doesn’t throw error-

example-
val orderPairedRDD2=orders.map(orders=>{
val o=orders.split(",")
(o(0).toInt,orders)
})


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

Sorry Answer was already there in 90th Video of CCA 175-scala.

val orderitemstrial=orderItems.map(oi=>(oi.split(",")(1).toInt,(oi.split(",")(2).toInt,oi.split(",")(4).toFloat )))

Not deleting query , as I struggled a bit.


#3