Not able to process correct queries. Bug in Spark


#1

sqlContext.sql(“select order_status, count(*) from orders group by order_status”).show()

I’m getting an irrelevant error due to spark. Please help me resolve this asap. I’m running the above correct command which and I am this error every-time. This is wasting a lot of my time. Kindly resolve this asap.

18/06/06 14:57:27 INFO ParseDriver: Parsing command: select order_status, count(*) from orders group by order_status
18/06/06 14:57:28 INFO ParseDriver: Parse Completed
18/06/06 14:57:28 INFO FileInputFormat: Total input paths to process : 1
18/06/06 14:57:28 INFO SparkContext: Starting job: show at :26
18/06/06 14:57:28 INFO DAGScheduler: Registering RDD 6 (show at :26)
18/06/06 14:57:28 INFO DAGScheduler: Got job 0 (show at :26) with 1 output partitions
18/06/06 14:57:28 INFO DAGScheduler: Final stage: ResultStage 1 (show at :26)
18/06/06 14:57:28 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18/06/06 14:57:28 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18/06/06 14:57:28 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[6] at show at :26), which has no missing parents
18/06/06 14:57:28 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 11.2 KB, free 376.2 KB)
18/06/06 14:57:28 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.5 KB, free 381.8 KB)
18/06/06 14:57:28 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.1.113:35389 (size: 5.5 KB, free: 511.1 MB)
18/06/06 14:57:28 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1008
18/06/06 14:57:28 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[6] at show at :26)
18/06/06 14:57:28 INFO YarnScheduler: Adding task set 0.0 with 2 tasks
18/06/06 14:57:28 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, wn03.itversity.com, partition 0,NODE_LOCAL, 2156 bytes)
18/06/06 14:57:28 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, wn01.itversity.com, partition 1,NODE_LOCAL, 2156 bytes)
18/06/06 14:57:28 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on wn03.itversity.com:38167 (size: 5.5 KB, free: 1247.2 MB)
18/06/06 14:57:28 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on wn01.itversity.com:46688 (size: 5.5 KB, free: 1247.2 MB)
18/06/06 14:57:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on wn03.itversity.com:38167 (size: 28.4 KB, free: 1247.2 MB)
18/06/06 14:57:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on wn01.itversity.com:46688 (size: 28.4 KB, free: 1247.2 MB)
18/06/06 14:57:30 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, wn01.itversity.com): java.lang.NumberFormatException: For input string: “2014-02-23 00:00:00.0”
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:30)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/06/06 14:57:30 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 2, wn01.itversity.com, partition 1,NODE_LOCAL, 2156 bytes)
18/06/06 14:57:30 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2) on executor wn01.itversity.com: java.lang.NumberFormatException (For input string: “2014-02-23 00:00:00.0”)
[duplicate 1]
18/06/06 14:57:30 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 3, wn02.itversity.com, partition 1,NODE_LOCAL, 2156 bytes)
18/06/06 14:57:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, wn03.itversity.com): java.lang.NumberFormatException: For input string: “2013-07-25 00:00:00.0”
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:30)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)

18/06/06 14:57:30 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 2, wn01.itversity.com, partition 1,NODE_LOCAL, 2156 bytes)
18/06/06 14:57:30 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2) on executor wn01.itversity.com: java.lang.NumberFormatException (For input string: “2014-02-23 00:00:00.0”) [duplicate 1]
18/06/06 14:57:30 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 3, wn02.itversity.com, partition 1,NODE_LOCAL, 2156 bytes)
18/06/06 14:57:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, wn03.itversity.com): java.lang.NumberFormatException: For input string: “2013-07-25 00:00:00.0”
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:30)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/06/06 14:57:30 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 4, wn03.itversity.com, partition 0,NODE_LOCAL, 2156 bytes)

18/06/06 14:57:30 INFO DAGScheduler: ShuffleMapStage 0 (show at :26) failed in 2.027 s
18/06/06 14:57:30 INFO DAGScheduler: Job 0 failed: show at :26, took 2.080838 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, wn01.itversity.com): ja
va.lang.NumberFormatException: For input string: “2013-07-25 00:00:00.0”
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:29)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)


Sign up for our state of the art Big Data cluster for hands on practice as developer. Cluster have Hadoop, Spark, Hive, Sqoop, Kafka and more.



#2

This happening in the lab environment


#3

scala> 18/06/06 14:57:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on wn02.itversity.com:38823 (size: 28.4 KB, free: 1247.2 MB)
18/06/06 14:57:31 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 3, wn02.itversity.com): TaskKilled (killed intentionally)
18/06/06 14:57:31 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool

This is the last line of the error


#4

You need to go through error more carefully. Your query is failing with above message.
Make sure you are in right database, table structure is correct.

Please note that it is not bug in Spark.