Hello guys,
I am unable to understand how the aggregateByKey() was implemented in this below example
https://gist.github.com/tmcgrath/dd8a0f5fb19201deb65f
Even tough it explains the simple way, I am unable to understand properly due to Scala.
Could please explain the same above example using PySpark ? It would be very helpful for me as I got stuck.
Note: I’ve understood how the map(),reduce(),reduceByKey() works, but finding hard to understand the semantics & implementation of aggregateByKey().
Thanks
Gautham P