Basic Filter failing in Spark REPL Session - Eshwar


#1

scala> val orderRDD = sc.textFile("/public/retail_db/orders")

18/04/22 22:07:01 INFO MemoryStore: Block broadcast_22 stored as values in memory (estimated size 338.9 KB, free 1090.9 KB)
18/04/22 22:07:01 INFO MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated size 28.4 KB, free 1119.3 KB)
18/04/22 22:07:01 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on 172.16.1.109:57000 (size: 28.4 KB, free: 457.8 MB)
18/04/22 22:07:01 INFO SparkContext: Created broadcast 22 from textFile at :32
orderRDD: org.apache.spark.rdd.RDD[String] = /public/retail_db/orders MapPartitionsRDD[11] at textFile at :32

scala> orderRDD.map(order => order.split(",")(3)).take(10).foreach(println)

18/04/22 22:07:15 INFO FileInputFormat: Total input paths to process : 1
18/04/22 22:07:15 INFO SparkContext: Starting job: take at :35
18/04/22 22:07:15 INFO DAGScheduler: Got job 20 (take at :35) with 1 output partitions
18/04/22 22:07:15 INFO DAGScheduler: Final stage: ResultStage 20 (take at :35)
18/04/22 22:07:15 INFO DAGScheduler: Parents of final stage: List()
18/04/22 22:07:15 INFO DAGScheduler: Missing parents: List()
18/04/22 22:07:15 INFO DAGScheduler: Submitting ResultStage 20 (MapPartitionsRDD[12] at map at :35), which has no missing parents
18/04/22 22:07:15 INFO MemoryStore: Block broadcast_23 stored as values in memory (estimated size 3.3 KB, free 1122.6 KB)
18/04/22 22:07:15 INFO MemoryStore: Block broadcast_23_piece0 stored as bytes in memory (estimated size 1964.0 B, free 1124.5 KB)
18/04/22 22:07:15 INFO BlockManagerInfo: Added broadcast_23_piece0 in memory on 172.16.1.109:57000 (size: 1964.0 B, free: 457.8 MB)
18/04/22 22:07:15 INFO SparkContext: Created broadcast 23 from broadcast at DAGScheduler.scala:1008
18/04/22 22:07:15 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 20 (MapPartitionsRDD[12] at map at :35)
18/04/22 22:07:15 INFO YarnScheduler: Adding task set 20.0 with 1 tasks
18/04/22 22:07:15 INFO TaskSetManager: Starting task 0.0 in stage 20.0 (TID 36, wn04.itversity.com, partition 0,RACK_LOCAL, 2167 bytes)
18/04/22 22:07:15 INFO BlockManagerInfo: Added broadcast_23_piece0 in memory on wn04.itversity.com:40515 (size: 1964.0 B, free: 1247.2 MB)
18/04/22 22:07:15 WARN TaskSetManager: Lost task 0.0 in stage 20.0 (TID 36, wn04.itversity.com): java.lang.ClassNotFoundException: $line100.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

18/04/22 22:07:15 INFO TaskSetManager: Starting task 0.1 in stage 20.0 (TID 37, wn04.itversity.com, partition 0,RACK_LOCAL, 2167 bytes)
18/04/22 22:07:15 INFO TaskSetManager: Lost task 0.1 in stage 20.0 (TID 37) on executor wn04.itversity.com: java.lang.ClassNotFoundException ($line100.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 1]
18/04/22 22:07:15 INFO TaskSetManager: Starting task 0.2 in stage 20.0 (TID 38, wn04.itversity.com, partition 0,RACK_LOCAL, 2167 bytes)
18/04/22 22:07:15 INFO TaskSetManager: Lost task 0.2 in stage 20.0 (TID 38) on executor wn04.itversity.com: java.lang.ClassNotFoundException ($line100.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 2]
18/04/22 22:07:15 INFO TaskSetManager: Starting task 0.3 in stage 20.0 (TID 39, wn04.itversity.com, partition 0,RACK_LOCAL, 2167 bytes)
18/04/22 22:07:15 INFO BlockManagerInfo: Removed broadcast_21_piece0 on 172.16.1.109:57000 in memory (size: 863.0 B, free: 457.8 MB)
18/04/22 22:07:15 INFO BlockManagerInfo: Removed broadcast_21_piece0 on wn04.itversity.com:40515 in memory (size: 863.0 B, free: 1247.2 MB)
18/04/22 22:07:15 INFO ContextCleaner: Cleaned accumulator 20
18/04/22 22:07:15 INFO BlockManagerInfo: Removed broadcast_20_piece0 on 172.16.1.109:57000 in memory (size: 1967.0 B, free: 457.8 MB)
18/04/22 22:07:15 INFO BlockManagerInfo: Removed broadcast_20_piece0 on wn04.itversity.com:40515 in memory (size: 1967.0 B, free: 1247.2 MB)
18/04/22 22:07:15 INFO TaskSetManager: Lost task 0.3 in stage 20.0 (TID 39) on executor wn04.itversity.com: java.lang.ClassNotFoundException ($line100.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 3]
18/04/22 22:07:15 ERROR TaskSetManager: Task 0 in stage 20.0 failed 4 times; aborting job
18/04/22 22:07:15 INFO YarnScheduler: Removed TaskSet 20.0, whose tasks have all completed, from pool
18/04/22 22:07:15 INFO YarnScheduler: Cancelling stage 20
18/04/22 22:07:15 INFO DAGScheduler: ResultStage 20 (take at :35) failed in 0.042 s
18/04/22 22:07:15 INFO DAGScheduler: Job 20 failed: take at :35, took 0.054025 s
18/04/22 22:07:15 INFO ContextCleaner: Cleaned accumulator 19
18/04/22 22:07:15 INFO BlockManagerInfo: Removed broadcast_18_piece0 on 172.16.1.109:57000 in memory (size: 1964.0 B, free: 457.8 MB)
18/04/22 22:07:15 INFO BlockManagerInfo: Removed broadcast_18_piece0 on wn04.itversity.com:40515 in memory (size: 1964.0 B, free: 1247.2 MB)
18/04/22 22:07:15 INFO ContextCleaner: Cleaned accumulator 18
18/04/22 22:07:15 INFO BlockManagerInfo: Removed broadcast_17_piece0 on 172.16.1.109:57000 in memory (size: 1882.0 B, free: 457.8 MB)
18/04/22 22:07:15 INFO BlockManagerInfo: Removed broadcast_17_piece0 on wn04.itversity.com:40515 in memory (size: 1882.0 B, free: 1247.2 MB)
18/04/22 22:07:15 INFO ContextCleaner: Cleaned accumulator 17
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 39, wn04.itversity.com): java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1642)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:622)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1869)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1882)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1335)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
at org.apache.spark.rdd.RDD.take(RDD.scala:1309)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:44)
at $iwC$$iwC$$iwC$$iwC.(:46)
at $iwC$$iwC$$iwC.(:48)
at $iwC$$iwC.(:50)
at $iwC.(:52)
at (:54)
at .(:58)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


#2

It worked on creating a new Spark REPL session. Thanks.


#3