combineByKey in scala to pyspark


#1

((‘1374724800000’, ‘CLOSED’), (299.98, 1))
((‘1374724800000’, ‘PENDING_PAYMENT’), (199.99, 2))
((‘1374724800000’, ‘PENDING_PAYMENT’), (250.0, 2))
((‘1374724800000’, ‘PENDING_PAYMENT’), (129.99, 2))
((‘1374724800000’, ‘CLOSED’), (49.98, 4))
((‘1374724800000’, ‘CLOSED’), (299.95, 4))
((‘1374724800000’, ‘CLOSED’), (150.0, 4))
((‘1374724800000’, ‘CLOSED’), (199.92, 4))
((‘1374724800000’, ‘COMPLETE’), (299.98, 5))
((‘1374724800000’, ‘COMPLETE’), (299.95, 5))
((‘1374724800000’, ‘COMPLETE’), (99.96, 5))
((‘1374724800000’, ‘COMPLETE’), (299.98, 5))
((‘1374724800000’, ‘COMPLETE’), (129.99, 5))
((‘1374724800000’, ‘COMPLETE’), (199.99, 7))
((‘1374724800000’, ‘COMPLETE’), (299.98, 7))

help required to convert below scala to pyspark .
combineByKey(
(x:(Double, Int))=>(x._1, Set(x._2)),
(x:(Double, Set[Int]), y:(Double, Int))=>(x._1 + y._1, x._2 + y._2),
(x:(Double, Set[Int]), y:(Double, Set[Int])) => (x._1 + y._1, x._2 ++ y._2)
)


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster