SqlContext.read.json command not working

apache-spark

#1

When I execute the below command it is throwing error

val ordersDF = sqlContext.read.json("/public/retail_db_json/orders")
or
sqlContext.load("/public/retail_db_json/orders")

both commands are throwing below error
17/11/18 17:26:21 INFO JSONRelation: Listing hdfs://nn01.itversity.com:8020/public/retail_db_json/orders on driver
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:

org.apache.spark.SparkContext.(SparkContext.scala:82)
org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
$iwC$$iwC.(:15)
$iwC.(:24)
(:26)

But in hadoop the file is present in /public/retail_db_json/orders

How to rectify this error?


#2

@rajeshvaasudevan Execute again, I am not getting any error.

I understand the problem that you wold have ran sc.stop() before running the above command. I have checked both cases. I am getting the same error when I try to ran above command after running sc.stop() .




Expected Result:
scala> val ordersDF = sqlContext.read.json("/public/retail_db_json/orders")
17/11/20 10:55:52 INFO JSONRelation: Listing hdfs://nn01.itversity.com:8020/public/retail_db_json/orders on driver
17/11/20 10:55:53 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 344.6 KB, free 344.6 KB)
17/11/20 10:55:53 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 28.4 KB, free 373.0 KB)
17/11/20 10:55:53 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.1.100:50305 (size: 28.4 KB, free: 511.1 MB)
17/11/20 10:55:53 INFO SparkContext: Created broadcast 0 from json at :28
17/11/20 10:55:53 INFO FileInputFormat: Total input paths to process : 1
17/11/20 10:55:53 INFO SparkContext: Starting job: json at :28
17/11/20 10:55:53 INFO DAGScheduler: Got job 0 (json at :28) with 2 output partitions
17/11/20 10:55:53 INFO DAGScheduler: Final stage: ResultStage 0 (json at :28)
17/11/20 10:55:53 INFO DAGScheduler: Parents of final stage: List()
17/11/20 10:55:53 INFO DAGScheduler: Missing parents: List()
17/11/20 10:55:53 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at json at :28), which has no missing parents
17/11/20 10:55:53 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.4 KB, free 377.4 KB)
17/11/20 10:55:53 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 379.9 KB)
17/11/20 10:55:53 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.1.100:50305 (size: 2.5 KB, free: 511.1 MB)
17/11/20 10:55:53 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1008
17/11/20 10:55:53 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at json at :28)
17/11/20 10:55:53 INFO YarnScheduler: Adding task set 0.0 with 2 tasks
17/11/20 10:55:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, wn02.itversity.com, partition 0,RACK_LOCAL, 2211 bytes)
17/11/20 10:55:54 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, wn03.itversity.com, partition 1,RACK_LOCAL, 2211 bytes)
17/11/20 10:55:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on wn03.itversity.com:56447 (size: 2.5 KB, free: 511.1 MB)
17/11/20 10:55:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on wn02.itversity.com:54963 (size: 2.5 KB, free: 511.1 MB)
17/11/20 10:55:54 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on wn02.itversity.com:54963 (size: 28.4 KB, free: 511.1 MB)
17/11/20 10:55:54 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on wn03.itversity.com:56447 (size: 28.4 KB, free: 511.1 MB)
17/11/20 10:55:55 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1831 ms on wn02.itversity.com (1/2)
17/11/20 10:55:55 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1833 ms on wn03.itversity.com (2/2)
17/11/20 10:55:55 INFO DAGScheduler: ResultStage 0 (json at :28) finished in 1.863 s
17/11/20 10:55:55 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/11/20 10:55:56 INFO DAGScheduler: Job 0 finished: json at :28, took 2.368305 s
ordersDF: org.apache.spark.sql.DataFrame = [order_customer_id: bigint, order_date: string, order_id: bigint, order_status: string]

scala> ordersDF.first()
17/11/20 10:56:09 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 126.1 KB, free 506.0 KB)
17/11/20 10:56:09 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 28.3 KB, free 534.3 KB)
17/11/20 10:56:09 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.16.1.100:50305 (size: 28.3 KB, free: 511.1 MB)
17/11/20 10:56:09 INFO SparkContext: Created broadcast 2 from first at :31
17/11/20 10:56:09 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 344.6 KB, free 878.9 KB)
17/11/20 10:56:09 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 28.4 KB, free 907.3 KB)
17/11/20 10:56:09 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.16.1.100:50305 (size: 28.4 KB, free: 511.0 MB)
17/11/20 10:56:09 INFO SparkContext: Created broadcast 3 from first at :31
17/11/20 10:56:09 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 172.16.1.100:50305 in memory (size: 28.3 KB, free: 511.1 MB)
17/11/20 10:56:09 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 172.16.1.100:50305 in memory (size: 2.5 KB, free: 511.1 MB)
17/11/20 10:56:09 INFO BlockManagerInfo: Removed broadcast_1_piece0 on wn03.itversity.com:56447 in memory (size: 2.5 KB, free: 511.1 MB)
17/11/20 10:56:09 INFO FileInputFormat: Total input paths to process : 1
17/11/20 10:56:09 INFO SparkContext: Starting job: first at :31
17/11/20 10:56:09 INFO DAGScheduler: Got job 1 (first at :31) with 1 output partitions
17/11/20 10:56:09 INFO DAGScheduler: Final stage: ResultStage 1 (first at :31)
17/11/20 10:56:09 INFO DAGScheduler: Parents of final stage: List()
17/11/20 10:56:09 INFO DAGScheduler: Missing parents: List()
17/11/20 10:56:09 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[9] at first at :31), which has no missing parents
17/11/20 10:56:09 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 6.0 KB, free 752.0 KB)
17/11/20 10:56:10 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.4 KB, free 755.4 KB)
17/11/20 10:56:10 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 172.16.1.100:50305 (size: 3.4 KB, free: 511.1 MB)
17/11/20 10:56:10 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1008
17/11/20 10:56:10 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[9] at first at :31)
17/11/20 10:56:10 INFO YarnScheduler: Adding task set 1.0 with 1 tasks
17/11/20 10:56:10 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, wn02.itversity.com, partition 0,RACK_LOCAL, 2211 bytes)
17/11/20 10:56:10 INFO BlockManagerInfo: Removed broadcast_1_piece0 on wn02.itversity.com:54963 in memory (size: 2.5 KB, free: 511.1 MB)
17/11/20 10:56:10 INFO ContextCleaner: Cleaned accumulator 1
17/11/20 10:56:10 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 172.16.1.100:50305 in memory (size: 28.4 KB, free: 511.1 MB)
17/11/20 10:56:10 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on wn02.itversity.com:54963 (size: 3.4 KB, free: 511.1 MB)
17/11/20 10:56:10 INFO BlockManagerInfo: Removed broadcast_0_piece0 on wn02.itversity.com:54963 in memory (size: 28.4 KB, free: 511.1 MB)
17/11/20 10:56:10 INFO BlockManagerInfo: Removed broadcast_0_piece0 on wn03.itversity.com:56447 in memory (size: 28.4 KB, free: 511.1 MB)
17/11/20 10:56:10 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on wn02.itversity.com:54963 (size: 28.4 KB, free: 511.1 MB)
17/11/20 10:56:10 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 299 ms on wn02.itversity.com (1/1)
17/11/20 10:56:10 INFO DAGScheduler: ResultStage 1 (first at :31) finished in 0.299 s
17/11/20 10:56:10 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/11/20 10:56:10 INFO DAGScheduler: Job 1 finished: first at :31, took 0.307235 s
res0: org.apache.spark.sql.Row = [11599,2013-07-25 00:00:00.0,1,CLOSED]


#3

I am getting the same error today when I try to write parquet file to Hadoop. I haven’t run sc.stop(). I keep on getting this error and many logs but don’t know how to proceed. I think I spent half a day repeating the same thing-initializing spark context and running the same code over and over again and getting same error.

I have copied the log below:

java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:
org.apache.spark.SparkContext.(SparkContext.scala:82)
org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
$iwC$$iwC.(:15)
$iwC.(:24)
(:26)
.(:30)
.()
.(:7)
.()
$print()
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
The currently active SparkContext was created at:
(No active SparkContext.)

    at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
    at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1343)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:126)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
    at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)