Need help with spark-submit


#1

Hi there,

I tried to write Word count example with CLI on the labs.Below is the code and output

scala>import org.apache.spark.SparkConf,org.apache.spark.SparkContext
object WordCount {
def main(args: Array[String]) {
val conf = new SparkConf().
setAppName(“word count”).
setMaster(“local”)
val sc = new SparkContext(conf)
val inputpath = “/user/stgvenu9/json_files/learn.txt"
val outputpath = “/user/stgvenu9/spark_output"
val wc = sc.textFile(inputpath).
flatMap(rec => rec.split(”,”)).
map(rec => (rec,1)).
reduceByKey((acc,value) => acc + value)
wc.saveAsTextFile(outputpath)
}}

defined module WordCount

Now the question is,how do I submit this on the labs to get the output?
Thanks for the help


#2
  1. /home/<YOUR_USER_ID>/scalacode/wordcount ==> this is your base directory

  2. create directory src/main/scala inside of/home/<YOUR_USER_ID>/scalacode/wordcount (use mkdir -p src/main/scala)

  3. inside src/main/scala create app name called WordCount.scala and copy your (above pgm) here save with ESC+:x:

  4. View your WordCount.scala pgm using vi src/main/scala/WordCount.scala (Make sure no typos)

  5. In the base directory, you have to create a file called build.sbt
    vi build.sbt
    name := “wordcount” #application name
    version := “1.0” # output app version
    scalaVersion := “2.10.6” # Scala version

    libraryDependencies += “org.apache.spark” % “spark-core_2.10” % “1.6.3”

    Save your build.sbt using ESC+:x
    (MAKE SURE NO TYPOS HERE)
    5.1 . Use $ sbt console (to validate your build.sbt is correct)

  6. Create jar file of your pg using sbt package
    /home/YOUR_USER_ID/scalacode/wordcount/src/main/scala $ sbt package (make sure you in this directory)
    Means your code is in src/main/scala directory inside your base directory.

  7. You should see jar file created with [SUCCESS] in green color

  8. Run $ sbt run (OR)

  9. Copy this .jar file on to your cluster using SCP then submit using spark-submit to run this on cluster
    $ scp /home/<YOUR_USER_ID>/scalacode/wordcount/target\scala-2.10/ .jar .
    . = your current directory

  10. Login to cluster and submit using spark-submit with --class, --master etc etc parameters.

Please post if you able to get your result with this.

Hope this helps.
Thanks
Venkat


#3

Hello Venkat,

Thank you very much for the detailed explanation & quick response.It helped a lot.

I’m able to execute the code with sbt run.Few things I corrected are

1.I removed 2.10 in the libraryDependencies line then sbt console was successful.
2.I ran sbt command in the directory where build.sbt is present i.e./home/YOUR_USER_ID/scalacode/wordcount instead of /home/YOUR_USER_ID/scalacode/wordcount/src/main/scala(where source code is present).
3.I used nn01.itversity.com:8020 in the input & output paths to fix exceptions

Could you please elaborate point#9.
1.where to copy the jar file?
2.When you say login to cluster,is it the launching the spark-shell? With in spark-shell,should I set any directory before submitting spark-submit? The reason is it is not recognising spark-submit cmd after launching spark-shell.

scala> spark-submit
:26: error: not found: value spark
spark-submit


#4

@venu:

Copy to your own directory in cluster if are using gw01.itversity.com then run your pgm using spark-shell (as it is you are doing now in gw01.itversity.com).

Thanks
Venkat


#5

Venkat,

Sorry I couldn’t follow that.Please elaborate if you don’t mind.


#6

Venkat,

I got this one.thank you!!


#7

@venu:

My apologies. I’m just seeing your msg.
Great. Carry on.
Thanks
Venkat


#8

Thanks for the detailed info, Venkat.
I am stuck at the copying step “Copy this .jar file on to your cluster using SCP then submit using spark-submit to run this on cluster
$ scp /home/<YOUR_USER_ID>/scalacode/wordcount/target\scala-2.10/ .jar .
. = your current directory”

My jar is stored under /home/userid/helloworld/target/scala-2.10/sample.jar . I am trying to copy this to my current directory /user/userid/Jars.

scp /home/userid/helloworld/target/scala-2.10/sample.jar /user/userid/Jars.

I am getting error.

Can you help?


#9

@HYm:
I guess “.” should not be there in path “/user/userid/Jars.” [I’m sure your path is /user/userid/Jars ]
It must be /user/userid/Jars (OR) just give dot (.) [ . = your current dir ]

scp /home/userid/helloworld/target/scala-2.10/sample.jar /user/userid/Jars OR

scp /home/userid/helloworld/target/scala-2.10/sample.jar .

Hope this helps.
Thanks
Venkat