Sum of Fields using Transformations

Hello All,

Could you please let me know how to do addition of perticular column using Transformations in Pyspark?

If you take Order_items table from Cloudera VM, here is how the record looks

(u’168481,67405,19,1,124.99,124.99’)

So now if i have to sum the fifth element (Order Sub Total) how can i achieve using Transformation. It’s easy using Reduce function. But how to use transformation?

@pramodvspk @ravi.tejarockon @itversity Please help with the question. Its little urgent

for i in sc.textFile(“order_items”).map(lambda x :(“total”,float(x.split(",")[5]))).reduceByKey(lambda x,y : x+y).collect():
print(i)

1 Like

Thanks so much @srikb88. It was so easy but could not recollect that i need to have same key for all the records for reduceByKey.