Exercise 02 - Get details of inactive customers

pyspark
spark-shell
apache-spark
scala
python

#22

Just wondering here…orders = sc.textFile(“Location”) will read the file from hadoop directory.
the problem statment is to read from local directory. I guess we have to pythin open command to read here which will give the list and then convert into RDD.

orders=open("/data/retail_db/orders/part-00000").readlines()
ordersRDD = sc.parallelize(orders)


#23

Yea. There are total of 1736 Smith, Mary in the customer database with different customer ids.