Ending Up with Error when preparing data to perform set operations


#1
orders = sc.textFile("/public/retail_db/orders")

orders201312 = orders.filter(lambda oi:oi.split(",")[1][:7] ==“2013-12”).map(lambda oi:(oi.split(",")[1],oi))
orders201401 = orders.filter(lambda oi:oi.split(",")[1][:7] ==“2014-01”).map(lambda oi:(oi.split(",")[1],oi))

orderitems = sc.textFile("/public/retail_db/order_items")
orderitemsmap = orderitems.map(lambda oi:(oi.split(",")[1],oi))

//Each and every step above i am able to fetch data.IS ANYTHING WRONG WITH CLUSTER
//Facing error with below lines of code
orders201312join = orders201312.join(orderitemsmap)
orders201401join = orders201401.join(orderitemsmap)

ERROR:
Traceback (most recent call last):
File “”, line 1, in
File “/usr/hdp/2.5.0.0-1245/spark/python/pyspark/rdd.py”, line 1318, in first
raise ValueError(“RDD is empty”)
ValueError: RDD is empty


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

Able to identify fault.

If i rewrite the these two statements i am able to fetch results:

orders201312 = orders.filter(lambda oi:oi.split(",")[1][:7] ==“2013-12”).map(lambda oi:(oi.split(",")[0],oi))
orders201401 = orders.filter(lambda oi:oi.split(",")[1][:7] ==“2014-01”).map(lambda oi:(oi.split(",")[0],oi))


#3