Scala command not working


#1

Hi, I am practicing the steps mentioned in Udemy(CCA 175 Certification using scala), I am at section5 video 61. When I try to run ‘Orders.Frist’ in scala prompt, I am getting below error and not fetchting results. Please help.

scala> orders.first
18/02/14 11:55:08 INFO FileInputFormat: Total input paths to process : 1
18/02/14 11:55:08 INFO SparkContext: Starting job: first at :31
18/02/14 11:55:08 INFO DAGScheduler: Got job 0 (first at :31) with 1 output partitions
18/02/14 11:55:08 INFO DAGScheduler: Final stage: ResultStage 0 (first at :31)
18/02/14 11:55:08 INFO DAGScheduler: Parents of final stage: List()
18/02/14 11:55:08 INFO DAGScheduler: Missing parents: List()
18/02/14 11:55:08 INFO DAGScheduler: Submitting ResultStage 0 (/public/retail_db/orders MapPartitionsRDD[1] at textFile at :28), which has no missing parents
18/02/14 11:55:08 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.2 KB, free 368.1 KB)
18/02/14 11:55:08 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1881.0 B, free 370.0 KB)
18/02/14 11:55:08 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.1.109:53557 (size: 1881.0 B, free: 457.8 MB)
18/02/14 11:55:08 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1008
18/02/14 11:55:08 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (/public/retail_db/orders MapPartitionsRDD[1] at textFile at :28)
18/02/14 11:55:08 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
18/02/14 11:55:08 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, wn02.itversity.com, partition 0,NODE_LOCAL, 2167 bytes)
18/02/14 11:55:09 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, wn02.itversity.com): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_1_piece0 of broadcast_1
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1212)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_1_piece0 of broadcast_1
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:137)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:175)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1205)
… 11 more

Thanks,
Anand


#2

Before to run above command, I ran below steps:

spark-shell --master yarn --conf spark-ui-port 12752
import org.apache.spark.{SprakConf,SparkContext}
val conf=new SparkConf().setAppName(“Daily Revenue”).setMaster(“yarn-client”)
val sc=new SparkContext(conf)


#3

Please help to resolve the issue.

Thanks,
Anand


#4

@anandversity I found a typo.

Try this below code and let me know status.

import org.apache.spark.{SparkConf,SparkContext}
val conf=new SparkConf().setAppName("Daily Revenue").setMaster("yarn-client")
val sc=new SparkContext(conf)

#5

Hi Balu,
Below are the all commands I used, but no luck.

spark-shell --master yarn
–conf spark-ui-port 12586
–num-executors 1
–executor-memory 512M
import org.apache.spark.{SparkConf,SparkContext}
val conf=new SparkConf().setAppName(“Daily Revenue”).setMaster(“yarn-client”)
val sc=new SparkContext(conf)
sc.getConf.getAll.foreach(println)
val orders=sc.textFile("/public/retail_db/orders")
orders.first

after above ‘orders.first’ command entered I got below error message:

    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_1_piece0 of broadcast_1
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1212)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_1_piece0 of broadcast_1
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:137)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:175)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1205)
… 11 more


#6

Hi,

I resolved the issue, now working fine. The issue is because of SparkContext is already running and I am executing above commands, hence I ran ‘sc.stop’ and then ran ‘val orders…’ and it worked.

Thanks,
Anand


#7