Getting java.lang.ArrayIndexOutOfBoundsException


#1

Team,

I am trying to execute a very simple operation in spark and getting the Out of bound exception error.

Code:

val a=sc.textFile(“path”)

state constituency candidate_name sex age category partyname partysymbol general postal total pct_of_total_votes pct_of_polled_votes totalvoters
Andhra Pradesh Adilabad GODAM NAGESH M 49 ST TRS Car 425762 5085 430847 31.07931864 40.81807244 1386282
Andhra Pradesh Adilabad NARESH M 37 ST INC Hand 257994 1563 259557 18.72324679 24.59020587 1386282
Andhra Pradesh Adilabad RAMESH RATHOD M 48 ST TDP Bicycle 182879 1319 184198 13.28719553 17.45075933 1386282
Andhra Pradesh Adilabad RATHOD SADASHIV M 55 ST BSP Elephant 94363 57 94420 6.81102402 8.945269201 1386282
Andhra Pradesh Adilabad NETHAWATH RAMDAS M 44 ST IND Auto- Rickshaw 41028 4 41032 2.959859538 3.88733622 1386282
Andhra Pradesh Adilabad PAWAR KRISHNA M 33 ST IND Bat 5051 4 5055 0.364644423 0.478906331 1386282
Andhra Pradesh Adilabad BANKA SAHADEV M 53 ST IND Gas Cylinder 4780 7 4787 0.345312137 0.453516243 1386282
Andhra Pradesh Adilabad MOSALI CHINNAIAH M 40 ST IND Almirah 8842 17 8859 0.639047467 0.839294004 1386282
Andhra Pradesh Adilabad None of the Above NOTA NOTA 17021 63 17084 1.232361092 1.618523396 1386282
Andhra Pradesh Peddapalle

val b=a.map(x => x.split("\t")(1))

Error:

scala> b.take(10)
18/09/10 23:41:27 INFO SparkContext: Starting job: take at :32
18/09/10 23:41:27 INFO DAGScheduler: Got job 9 (take at :32) with 1 output partitions
18/09/10 23:41:27 INFO DAGScheduler: Final stage: ResultStage 9 (take at :32)
18/09/10 23:41:27 INFO DAGScheduler: Parents of final stage: List()
18/09/10 23:41:27 INFO DAGScheduler: Missing parents: List()
18/09/10 23:41:27 INFO DAGScheduler: Submitting ResultStage 9 (MapPartitionsRDD[9] at map at :29), which has no missing parents
18/09/10 23:41:27 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 3.4 KB, free 510.3 MB)
18/09/10 23:41:27 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 2013.0 B, free 510.3 MB)
18/09/10 23:41:27 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on 172.16.1.113:45155 (size: 2013.0 B, free: 511.1 MB)
18/09/10 23:41:27 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1008
18/09/10 23:41:27 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 9 (MapPartitionsRDD[9] at map at :29)
18/09/10 23:41:27 INFO YarnScheduler: Adding task set 9.0 with 1 tasks
18/09/10 23:41:29 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
18/09/10 23:41:31 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (wn05.itversity.com:44837) with ID 11
18/09/10 23:41:31 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 19, wn05.itversity.com, partition 0,RACK_LOCAL, 2171 bytes)
18/09/10 23:41:31 INFO ExecutorAllocationManager: New executor 11 has registered (new total is 1)
18/09/10 23:41:31 INFO BlockManagerMasterEndpoint: Registering block manager wn05.itversity.com:45250 with 511.1 MB RAM, BlockManagerId(11, wn05.itversity.com, 45250)
18/09/10 23:41:31 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on wn05.itversity.com:45250 (size: 2013.0 B, free: 511.1 MB)
18/09/10 23:41:32 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on wn05.itversity.com:45250 (size: 30.8 KB, free: 511.1 MB)
18/09/10 23:41:32 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 19, wn05.itversity.com): java.lang.ArrayIndexOutOfBoundsException: 1
at $line51.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at $line51.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1335)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1335)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/09/10 23:41:32 INFO TaskSetManager: Starting task 0.1 in stage 9.0 (TID 20, wn05.itversity.com, partition 0,RACK_LOCAL, 2171 bytes)
18/09/10 23:41:32 INFO TaskSetManager: Lost task 0.1 in stage 9.0 (TID 20) on executor wn05.itversity.com: java.lang.ArrayIndexOutOfBoundsException (1) [duplicate 1]
18/09/10 23:41:32 INFO TaskSetManager: Starting task 0.2 in stage 9.0 (TID 21, wn05.itversity.com, partition 0,RACK_LOCAL, 2171 bytes)
18/09/10 23:41:32 INFO TaskSetManager: Lost task 0.2 in stage 9.0 (TID 21) on executor wn05.itversity.com: java.lang.ArrayIndexOutOfBoundsException (1) [duplicate 2]
18/09/10 23:41:32 INFO TaskSetManager: Starting task 0.3 in stage 9.0 (TID 22, wn05.itversity.com, partition 0,RACK_LOCAL, 2171 bytes)
18/09/10 23:41:32 INFO TaskSetManager: Lost task 0.3 in stage 9.0 (TID 22) on executor wn05.itversity.com: java.lang.ArrayIndexOutOfBoundsException (1) [duplicate 3]
18/09/10 23:41:32 ERROR TaskSetManager: Task 0 in stage 9.0 failed 4 times; aborting job
18/09/10 23:41:32 INFO YarnScheduler: Removed TaskSet 9.0, whose tasks have all completed, from pool
18/09/10 23:41:32 INFO YarnScheduler: Cancelling stage 9
18/09/10 23:41:32 INFO DAGScheduler: ResultStage 9 (take at :32) failed in 5.032 s
18/09/10 23:41:32 INFO DAGScheduler: Job 9 failed: take at :32, took 5.037124 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in stage 9.0 (TID 22, wn05.itversity.com): java.lang.ArrayIndexOutOfBoundsException: 1
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1335)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1335)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1642)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:622)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1831)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1844)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1857)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1335)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
at org.apache.spark.rdd.RDD.take(RDD.scala:1309)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:32)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:37)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:41)
at $iwC$$iwC$$iwC$$iwC.(:43)
at $iwC$$iwC$$iwC.(:45)
at $iwC$$iwC.(:47)
at $iwC.(:49)
at (:51)
at .(:55)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1335)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1335)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

scala> 18/09/10 23:41:43 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 11
18/09/10 23:41:43 INFO ExecutorAllocationManager: Removing executor 11 because it has been idle for 10 seconds (new desired total will be 0)
18/09/10 23:41:43 INFO YarnClientSchedulerBackend: Disabling executor 11.
18/09/10 23:41:43 INFO DAGScheduler: Executor lost: 11 (epoch 0)
18/09/10 23:41:43 INFO BlockManagerMasterEndpoint: Trying to remove executor 11 from BlockManagerMaster.
18/09/10 23:41:43 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(11, wn05.itversity.com, 45250)
18/09/10 23:41:43 INFO BlockManagerMaster: Removed 11 successfully in removeExecutor
18/09/10 23:41:43 INFO YarnScheduler: Executor 11 on wn05.itversity.com killed by driver.
18/09/10 23:41:43 INFO ExecutorAllocationManager: Existing executor 11 has been removed (new total is 0)
b.take(val b=a.map(x => x.split("\t")(1))


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

can you paste the path where the file exists?


#3

Hi Sunil,

Thanks. I have rectified it, it was problem with my coding.

Thanks again for your help.

Sathya


#4