PySpark - Using RDD -reduceByKey(), aggregateByKey() - Multiple Aggregation on Same RDD

Can we do multiple aggregations in one go on the same RDD

something similar to this sql below :

SELECT ORDER_ID,COUNT(ORDER_ITEM_ID),MIN(ORDER_ITEM_ID_SUBTOTAL),MAX(ORDER_ITEM_ID_SUBTOTAL),AVG(ORDER_ITEM_PRICE)

FROM ORDER_ITEMS GROUP BY ORDER_ID

The examples that we see in the aggregateByKey(), is doing two aggregations SUM and Count, but I would like some guidance if we can do multiple aggregations based on same key, can you please kindly explain with a full example.

Kind Regards,
Lakshminarayanan

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster