In a pyspark practice 75, I have the following code and stuck here:
[paslechoix@gw03 ~]$ hdfs dfs -cat p90_order_items/*
order_items = sc.textFile(“p90_order_items”)
Now, what I need to do are:
- I need to convert all the fields to integer or float;
- I need to calculate the sum of rev which is #4 field
Can someone tell me how to do this in python?
Thank you very much.