Error 'PipelinedRDD' object is not iterable


Hi Sir,

I am getting the following error PipelinedRDD’ object is not iterable while running the following code

orders = sc.textFile(r"/public/retail_db/orders/")
ordersMap = orders.mapPartitions(lambda part:map(lambda rec:(rec.split(",")[3],1),list(part)))
ordersReduce = ordersMap.reduceByKey(lambda tot,val: tot+val)
for i in ordersReduce: print(i)
ordersReduceMap = rec : rec[0]+"\t"+str(rec[1]))
for i in ordersReduceMap: print(i)

Please help me to resolve the error

Krishna Teja



You are iterating the RDD itself. You have to use take or collect before you iterate an RDD
Here is the updated code. You can use take(10) to get first 10 records

from pyspark import SparkConf, SparkContext
import sys

conf = SparkConf().setAppName("Orders Join OrderItems").setMaster(sys.argv[1])
sc = SparkContext(conf=conf)

inputPath = sys.argv[2]
orders = sc.textFile(inputPath + "/orders")

ordersMap = orders.mapPartitions(lambda part: map(lambda rec: (rec.split(",")[3], 1), list(part)))
ordersReduce = ordersMap.reduceByKey(lambda tot, val: tot+val)

for i in ordersReduce.take(10):

ordersReduceMap = rec: rec[0]+"\t"+str(rec[1]))

for i in ordersReduceMap.take(10):


Thanks @Varun_Upadhyay1 for solving the issue.