Aggregating Data (totals)

Hi,

After giving “orderItemsReduce = orderItemsMap.reduce(lambda rev1, rev2: rev1 + rev2)”, i am not getting the desired output. instead, it is showing the below lines. I am following the same steps as demonstrated in video. Could you please help me to get the desired output?
Thanks in advance.

output:

17/02/06 08:25:05 INFO spark.SparkContext: Starting job: reduce at :1
17/02/06 08:25:05 INFO scheduler.DAGScheduler: Got job 12 (reduce at :1) with 4 output partitions
17/02/06 08:25:05 INFO scheduler.DAGScheduler: Final stage: ResultStage 12 (reduce at :1)
17/02/06 08:25:05 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/02/06 08:25:05 INFO scheduler.DAGScheduler: Missing parents: List()
17/02/06 08:25:05 INFO scheduler.DAGScheduler: Submitting ResultStage 12 (PythonRDD[20] at reduce at :1), which has no missing parents
17/02/06 08:25:05 INFO storage.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 5.5 KB, free 926.3 KB)
17/02/06 08:25:05 INFO storage.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 3.5 KB, free 929.8 KB)
17/02/06 08:25:05 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on 192.168.37.130:34447 (size: 3.5 KB, free: 530.2 MB)
17/02/06 08:25:05 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1006
17/02/06 08:25:05 INFO scheduler.DAGScheduler: Submitting 4 missing tasks from ResultStage 12 (PythonRDD[20] at reduce at :1)
17/02/06 08:25:05 INFO cluster.YarnScheduler: Adding task set 12.0 with 4 tasks
17/02/06 08:25:06 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
17/02/06 08:25:07 INFO spark.ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 3)
17/02/06 08:25:08 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 4)
17/02/06 08:25:16 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (quickstart.cloudera:38083) with ID 8
17/02/06 08:25:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 33, quickstart.cloudera, partition 0,NODE_LOCAL, 2185 bytes)
17/02/06 08:25:16 INFO spark.ExecutorAllocationManager: New executor 8 has registered (new total is 1)
17/02/06 08:25:16 INFO storage.BlockManagerMasterEndpoint: Registering block manager quickstart.cloudera:54097 with 530.3 MB RAM, BlockManagerId(8, quickstart.cloudera, 54097)
17/02/06 08:25:16 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on quickstart.cloudera:54097 (size: 3.5 KB, free: 530.3 MB)
17/02/06 08:25:17 INFO storage.BlockManagerInfo: Added broadcast_13_piece0 in memory on quickstart.cloudera:54097 (size: 23.3 KB, free: 530.3 MB)
17/02/06 08:25:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 12.0 (TID 34, quickstart.cloudera, partition 1,NODE_LOCAL, 2185 bytes)
17/02/06 08:25:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 33) in 4748 ms on quickstart.cloudera (1/4)
17/02/06 08:25:21 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 12.0 (TID 35, quickstart.cloudera, partition 2,NODE_LOCAL, 2185 bytes)
17/02/06 08:25:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 12.0 (TID 34) in 587 ms on quickstart.cloudera (2/4)
17/02/06 08:25:22 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 12.0 (TID 36, quickstart.cloudera, partition 3,NODE_LOCAL, 2185 bytes)
17/02/06 08:25:22 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 12.0 (TID 35) in 341 ms on quickstart.cloudera (3/4)
17/02/06 08:25:22 INFO scheduler.DAGScheduler: ResultStage 12 (reduce at :1) finished in 16.509 s
17/02/06 08:25:22 INFO scheduler.DAGScheduler: Job 12 finished: reduce at :1, took 16.549121 s
17/02/06 08:25:22 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 12.0 (TID 36) in 313 ms on quickstart.cloudera (4/4)
17/02/06 08:25:22 INFO cluster.YarnScheduler: Removed TaskSet 12.0, whose tasks have all completed, from pool

Pls check below link

1 Like

@N_Chakote Thanks a lot for the suggestion. It worked after giving “orderItemsMap.reduce(lambda rev1, rev2: rev1 + rev2”

Thanks again.