More Spark Concepts and Core APIs

Originally published at:

As part of this lesson we will see Data Locality Number of tasks in each stage RDD partitions in each stage Determine number of tasks while generating shuffled RDDs numTasks parameters mapPartitions coalesce and repartition Develop applications card count by suit revenue per product for a given month Accumulators Broadcast Variables