I’m running a word count app in PySpark as shown in the ITVersity video here: https://youtu.be/ckxyJHpC0Qw?list=PLf0swTFhTI8qtIYxVoPOjA2fYzBFiNMue&t=5915
Here is the spark-submit command I tried running:
spark-submit --master yarn
src/main/python/WordCount.py local /public/randomtextwriter /user/lingaraj/wordcount-dir
Though I have specified the number executors (5), executor cores and memory capacity in this command, the application is running in driver locally without assigning any worker nodes with the required resources. I’m running this in gw02.itversity.com. I’m not able to upload the screenshot of executor section on the Spark UI as it’s throwing an error. So I’m just copying the same here. Hope it may get useful in understanding the issue.
Executor ID|Address|Status|RDD Blocks|Storage Memory|Disk Used|Cores|Active Tasks|Failed Tasks|Complete Tasks|Total Tasks|Task Time (GC Time)|Input|Shuffle Read|Shuffle Write|
|driver|gw02.itversity.com:44833|Active|0|0.0 B / 384.1 MB|0.0 B|1|2|0|11|13|2.3 min (0.3 s)|1.4 GB|0.0 B|58.4 MB|
I don’t know whether I’m missing something here in the command. Please guide me how to achieve this.