Deploy-mode in Spark

Hi @itversity I am using AWS to run my spark program. What will be the best choice of --deploy-mode (cluster | client)? Which has more advantage and why?

Hemant K Rout

In case of YARN client mode, the driver program runs on the machine from where the application is launched, your laptop. While in cluster mode, the driver program runs on one of the nodes in the cluster.

In client mode, you can interact with the application. With .collect() and other actions,results can be viewed on the REPL. But if you close the REPL, shutdown laptop, driver is killed and hence the whole application.

While in cluster mode, you start the job and forget about it, since the driver is running on one of the nodes on the cluster, not on the laptop. Using .saveAsTextFile() or other file format, you can store the results on HDFS and view them later.


thank you so much!!!