Can someone explain why is this error is displayed when I apply aggregateBykey to data given above the command.
Can you try the command as below:
new_value = new_data.aggregateByKey((0,0.0), lambda acc, value: (acc[0]+value, acc[1]+1), lambda total1, total2: (total1[0]+total2[0], total1[1]+total2[1]))
Yes This worked.
The question now is if we see the syntax on spark programming guide it says
aggregateByKey(zeroValue)(seqOp, combOp, [numTasks])
Here we do not have a comma between (zero value) and the rest of the command.
then how come our command worked with a ‘,’
Thats a good question. Its the syntax difference between python and scala context. The syntax mentioned in your reply above is the scala context syntax where you would not have a comma between the zero value and the rest of the command but in python context, it is separated by a comma.
thanks you very much for your inputs sir.