Determine number of tasks for later stages

Originally published at: http://www.itversity.com/topic/determine-number-of-tasks-for-later-stages/

Most of the times default number of tasks for later stages is incorrect. Let us understand criteria to customize number of tasks which generate new stage. Following transformations which go through shuffling have optional parameter called numTasks groupByKey reduceByKey aggregateByKey sortByKey join cartesian cogroup If numTasks is not passed, they inherit number of tasks from…