This is my code
from pyspark.sql import Row
orders = sc.textFile("/public/retail_db/orders")
ordersMap=orders.map(lambda x:(Row(order_id=(int(x.split(",")[0])),order_date=(x.split(",")[1]),order_customer_id=(int(x.split(",")[2])),order_status=(x.split(",")[3])))).toDf()
Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs
- Click here for access to state of the art 13 node Hadoop and Spark Cluster