ITversity lab- Map function issue

Hi,
I am practicing retail_db example on spark-shell, the map function worked for the orders table but throwing an exception for the order_items table. Below are the commands i am running,

val orderItemsRDD = sc.textFile(“file://”+url+"/order_items")
val orderItemsMap = orderItemsRDD.map(rec => (rec.split(",")(1).toInt, rec.split(",")(4).toFloat))

I have printed the orderItemsRDD rows and there is not issue there, so data is loaded properly. Second command also gets executed, but when i run foreach(println) , it throws below exception

17/03/25 19:13:42 ERROR Executor: Exception in task 0.0 in stage 9.0 (TID 9)
java.lang.ArrayIndexOutOfBoundsException: 1
at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:31)
at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:31)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1335)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1335)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882)

It is an ArrayOutOfBoundException, once check the indexes you are accessing. whether (4) is a valid index or not.

Do a count on RDD,
Maybe the orderItemsRDD might be empty.

Yes it is empty.it throws the same exception. That’s what the issue is, Same command works on my local with same set of data.

If it’s empty, you cannot perform operation on it. You are doing a split & parsing.

Maybe in your local, RDD has data.

Try this.
val orderItemsRDD = sc.textFile(“hdfs://”+url+"/order_items")

Issue was due to derby.log file in the data folder. I didnt realized it. It worked when i deleted it.
Anyway, thanks for the input.