Not able to get the output using the Map function with RDD

pyspark

#1

Please see the attached screenshot. I am facing this error for 2 days straight now. Is there a problem with my commands or the environment, not sure. Any help is greatly appreciated.

The error says “org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nn01.itversity.com:8020/user/vbhaarat90/public/retail_db/order_items

Thanks
Bhaarat!


#2

@bhaarat_pachori

Can you send the full code it would be helpful to resolve your issue.

Regards,
Sunil Abhishek


#3

Hi Sunil

orderItems = sc.textFile("public/retail_db/order_items")
orderItemsMap = orderItems.map(lambda o:(int(o.split(",")[1]), float(o.split(",")[4]) ))
for i in orderItemsMap.take(10): print(i)

These are just the three lines in the code. That’s it. I see the error after the **_for loop_**


#4

Hi Sunil

Thanks for helping, I have figured out the problem. It was because of the path of the dataset I was using. I was missing a forward first slash in the path, so the system was not searching in the root directory.
Path: orderItems = sc.textFile(“public/retail_db/order_items”)
Correct Path = orderItems = sc.textFile("/public/retail_db/order_items")

Thanks
Bhaarat


#5