What should be the setMaster() for running spark-submit?

spark-submit

#1

I want to run the spark-submit (as mentioned in https://kaizen.itversity.com/setup-development-environment-intellij-and-scala-big-data-hadoop-and-spark/) already created the jar and ran in local.
I moved the same to the lab and giving the below command format:
What should be the value for setMaster (instead of local here) to run the application in spark2 ?

spark-submit --class retail_db.GetRevenuePerOrder <PATH_TO_JAR> <INPUT_PATH> <OUTPUT_PATH>


#2

I want to run the spark-submit (as mentioned in https://kaizen.itversity.com/setup-development-environment-intellij-and-scala-big-data-hadoop-and-spark/) already created the jar and ran in local.
I moved the same to the lab and giving the below command format:
What should be the value for setMaster (instead of local here) to run the application in spark2 ?

spark-submit --class retail_db.GetRevenuePerOrder <PATH_TO_JAR> <INPUT_PATH> <OUTPUT_PATH>

Now I have tried using yarn-client as master here:

spark-submit --class retail_db.GetRevenuePerOrder /home/swatimishra/swati/spark2demo_2.12-0.1.jar yarn-client /user/swatimishra/retail_db/order_items/part-00000 /user/swatimishra/retail_db_output/GetRevenuePerOrder

but it gives me the error as:
19/01/27 08:34:41 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.16.1.108
ApplicationMaster RPC port: 0
queue: default
start time: 1548596076924
final status: UNDEFINED
tracking URL: http://rm01.itversity.com:19288/proxy/application_1540458187951_47349/
user: swatimishra
19/01/27 08:34:41 INFO YarnClientSchedulerBackend: Application application_1540458187951_47349 has started running.
19/01/27 08:34:41 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 42641.
19/01/27 08:34:41 INFO NettyBlockTransferService: Server created on gw02.itversity.com:42641
19/01/27 08:34:41 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/01/27 08:34:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, gw02.itversity.com, 42641, None)
19/01/27 08:34:41 INFO BlockManagerMasterEndpoint: Registering block manager gw02.itversity.com:42641 with 366.3 MB RAM, BlockManagerId(driver, gw02.itversity.com, 42641, None)
19/01/27 08:34:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, gw02.itversity.com, 42641, None)
19/01/27 08:34:41 INFO BlockManager: external shuffle service port = 7447
19/01/27 08:34:41 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, gw02.itversity.com, 42641, None)
19/01/27 08:34:42 INFO EventLoggingListener: Logging events to hdfs:/spark2-history/application_1540458187951_47349
19/01/27 08:34:42 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
19/01/27 08:34:42 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
19/01/27 08:34:47 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
19/01/27 08:34:47 ERROR SparkHadoopWriter: Aborting job job_20190127083442_0005.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, wn04.itversity.com,
executor 1): java.io.IOException: unexpected exception type
at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1582)
at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1154)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1810)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)

Please help me to solve this problem as it is a blocker for me now


#3

@Swati_Mishra I’m able to see the output in given hdfs path.