AggrgateByKey() in Pyspark


#1

Hi,
I am struggling to understand the concept of aggregateByKey(). I always get stuck in the 2nd lambda function, though I feel the first lambda function is enough to produce the desired result. Anyone if could help me in this, that would be a great help.
Could there be any scenario where the solution has only to be by using aagregateByKey() ? If reduceByKey() is the alternative, I am relieved.

Thanks ,
Aparna.