Data Frames : Lecture 194

Unable to find the below HDFS Path :

ordersDF = spark.read.json(’/Users/itversity/Research/data/retail_db_json/orders’)

pyspark.sql.utils.AnalysisException: u’Path does not exist: hdfs://nn01.itversity.com:8020/Users/itversity/Research/data/retail_db_json/orders;’

The directory is incorrect:

Take this>

ordersDB = spark.read.json(’/public/retail_db_json/orders/’)
for i in ordersDB.take(5): print(i)

Row(order_customer_id=11599, order_date=u’2013-07-25 00:00:00.0’, order_id=1, order_status=u’CLOSED’)
Row(order_customer_id=256, order_date=u’2013-07-25 00:00:00.0’, order_id=2, order_status=u’PENDING_PAYMENT’)
Row(order_customer_id=12111, order_date=u’2013-07-25 00:00:00.0’, order_id=3, order_status=u’COMPLETE’)
Row(order_customer_id=8827, order_date=u’2013-07-25 00:00:00.0’, order_id=4, order_status=u’CLOSED’)
Row(order_customer_id=11318, order_date=u’2013-07-25 00:00:00.0’, order_id=5, order_status=u’COMPLETE’)

Thanks, for sharing the JSON locations.
It works fine for me.