PySpark app running in yarn driver mode without utilizing the cluster capacity

pyspark
apache-spark

#1

I’m running a word count app in PySpark as shown in the ITVersity video here: https://youtu.be/ckxyJHpC0Qw?list=PLf0swTFhTI8qtIYxVoPOjA2fYzBFiNMue&t=5915

Here is the spark-submit command I tried running:
spark-submit --master yarn
–deploy-mode client
–conf spark.ui.port=24100
–conf spark.dynamicAllocation.enabled=false
–num-executors 5
–executor-memory 1024M
–executor-cores 2
src/main/python/WordCount.py local /public/randomtextwriter /user/lingaraj/wordcount-dir

Though I have specified the number executors (5), executor cores and memory capacity in this command, the application is running in driver locally without assigning any worker nodes with the required resources. I’m running this in gw02.itversity.com. I’m not able to upload the screenshot of executor section on the Spark UI as it’s throwing an error. So I’m just copying the same here. Hope it may get useful in understanding the issue.

Executor ID|Address|Status|RDD Blocks|Storage Memory|Disk Used|Cores|Active Tasks|Failed Tasks|Complete Tasks|Total Tasks|Task Time (GC Time)|Input|Shuffle Read|Shuffle Write|
|driver|gw02.itversity.com:44833|Active|0|0.0 B / 384.1 MB|0.0 B|1|2|0|11|13|2.3 min (0.3 s)|1.4 GB|0.0 B|58.4 MB|

I don’t know whether I’m missing something here in the command. Please guide me how to achieve this.

Thanks
Lingaraj


#2

Can you run with yarn-client and Let us know

src/main/python/wordcount.py yarn-client /public/randomtextwriter /user/annapurnachinta/python3/


#3

Yeah, it worked. I’m sorry, I didn’t notice that. I was focusing only on the ‘master’ parameter in spark-submit command.

Thank you Annapurna, for the quick help.


#4