Error 'PipelinedRDD' object is not iterable


#1

Hi Sir,

I am getting the following error PipelinedRDD’ object is not iterable while running the following code

orders = sc.textFile(r"/public/retail_db/orders/")
ordersMap = orders.mapPartitions(lambda part:map(lambda rec:(rec.split(",")[3],1),list(part)))
ordersReduce = ordersMap.reduceByKey(lambda tot,val: tot+val)
for i in ordersReduce: print(i)
ordersReduceMap = ordersReduce.map(lambda rec : rec[0]+"\t"+str(rec[1]))
for i in ordersReduceMap: print(i)

Please help me to resolve the error

Thanks,
Krishna Teja


#2

@tkrish23

You are iterating the RDD itself. You have to use take or collect before you iterate an RDD
Here is the updated code. You can use take(10) to get first 10 records

from pyspark import SparkConf, SparkContext
import sys

conf = SparkConf().setAppName("Orders Join OrderItems").setMaster(sys.argv[1])
sc = SparkContext(conf=conf)

inputPath = sys.argv[2]
orders = sc.textFile(inputPath + "/orders")

ordersMap = orders.mapPartitions(lambda part: map(lambda rec: (rec.split(",")[3], 1), list(part)))
ordersReduce = ordersMap.reduceByKey(lambda tot, val: tot+val)

for i in ordersReduce.take(10):
    print(i)

ordersReduceMap = ordersReduce.map(lambda rec: rec[0]+"\t"+str(rec[1]))

for i in ordersReduceMap.take(10):
    print(i)

#3

Thanks @Varun_Upadhyay1 for solving the issue.