Unable to run SparkPi on Yarn Cluster


#1

Hi Team,
I am trying to do run SparkPi program using Yarn Cluster, below is my code.
Can you please help to figure out

  1. why i am not able to execute this.
  2. where to look for logs and debug.
    FY. I am able to run regular .py files.
    [mpremchand78@gw02 spark-client]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2G --executor-cores 4 --queue default lib/spark-examples*.jar 10
    Below is the log trace:************************
    18/08/27 05:17:24 INFO Client: Application report for application_1533622723243_10134 (state: RUNNING)
    18/08/27 05:17:25 INFO Client: Application report for application_1533622723243_10134 (state: RUNNING)
    18/08/27 05:17:26 INFO Client: Application report for application_1533622723243_10134 (state: FINISHED)
    18/08/27 05:17:26 INFO Client:
    client token: N/A
    diagnostics: Exception was thrown 5 time(s) from Reporter thread.
    ApplicationMaster host: 172.16.1.108
    ApplicationMaster RPC port: 0
    queue: default
    start time: 1535361429893
    final status: FAILED
    tracking URL: http://rm01.itversity.com:19288/proxy/application_1533622723243_10134/
    user: mpremchand78
    Exception in thread “main” org.apache.spark.SparkException: Application application_1533622723243_10134 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1143)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1194)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    18/08/27 05:17:26 INFO ShutdownHookManager: Shutdown hook called
    18/08/27 05:17:26 INFO ShutdownHookManager: Deleting directory /tmp/spark-f67895c7-064d-429e-8562-aa35d5be8c87

#2

@Prem1,

You can find the logs using tracking URL.
Use the below command’

spark-submit --master yarn --deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--driver-memory 2G \
--executor-memory 2G \
--executor-cores 4 \
/usr/hdp/current/spark-client/lib/spark-examples-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar
indent preformatted text by 4 spaces

#3

Hi Sravan,
Thanks for your reply, even i am doing the same , do not know where is the mistake.
Now i tried with the code you have given me, i got the below Failed.
[mpremchand78@gw02 bin]$ spark-submit --master yarn --deploy-mode cluster
–class org.apache.spark.examples.SparkPi
–driver-memory 2G
–executor-memory 2G
–executor-cores 4
/usr/hdp/current/spark-client/lib/spark-examples-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar
Log*****************
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.16.1.107
ApplicationMaster RPC port: 0
queue: default
start time: 1535438841530
final status: UNDEFINED
tracking URL: http://rm01.itversity.com:19288/proxy/application_1533622723243_10650/
user: mpremchand78
18/08/28 02:47:39 INFO Client: Application report for application_1533622723243_10650 (state: RUNNING)
18/08/28 02:47:40 INFO Client: Application report for application_1533622723243_10650 (state: RUNNING)
18/08/28 02:47:41 INFO Client: Application report for application_1533622723243_10650 (state: RUNNING)
18/08/28 02:47:42 INFO Client: Application report for application_1533622723243_10650 (state: RUNNING)
18/08/28 02:47:43 INFO Client: Application report for application_1533622723243_10650 (state: FINISHED)
18/08/28 02:47:43 INFO Client:
client token: N/A
diagnostics: Exception was thrown 5 time(s) from Reporter thread.
ApplicationMaster host: 172.16.1.107
ApplicationMaster RPC port: 0
queue: default
start time: 1535438841530
final status: FAILED
tracking URL: http://rm01.itversity.com:19288/proxy/application_1533622723243_10650/
user: mpremchand78
Exception in thread “main” org.apache.spark.SparkException: Application application_1533622723243_10650 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1143)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1194)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/08/28 02:47:43 INFO ShutdownHookManager: Shutdown hook called
18/08/28 02:47:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-646cbc87-7430-4fd2-8d6a-b534bcd4aa4d


#4

@Prem1,

It is working fine.Have to use executors cores maximum as 2.

Use the below command

spark-submit --master yarn --deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--driver-memory 2G \
--executor-memory 2G \
--executor-cores 2 \
/usr/hdp/current/spark-client/lib/spark-examples-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar

#5

Thanks Sravan, I was able to run in cluster and client modes.
However i did not see the output when run in cluster mode.


#6

You should see output like this

I am using the below command.

   spark-submit --master yarn-client \
    --class org.apache.spark.examples.SparkPi \
    --driver-memory 2G \
    --executor-memory 2G \
    --executor-cores 2 \
    /usr/hdp/current/spark-client/lib/spark-examples-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar

#7

Hi Vinodnerella,
Thanks, Client is fine, i was referring to cluster mode.
“However i did not see the output when run in cluster mode”


#8

Hi Team,
please confirm if it will work in cluster mode or not.
thanks
Prem


#9

Can someone reply to this question please


#10

When the program is run in Yarn mode, the code is run on the executors. the driver only initializes the spark context and tries to get the resources from the resource manager. As the code is run on executors, any log outputs will also be printed on the executors and not on the driver program on the gateway.

Since Default jar only prints on the executor that is running, you can’t see on the console. You can write the program yourself and write the output to a file or use the collect api to fetch the data from memory to the driver program if you wish to verify the pi program.


#11