Not able to run pyspark script in yarn mode - "No module named pyspark"



Hi All,
I am trying to run pyspark script in yarn mode but I am getting the below error. Has anyone successfully executed pyspark script in yarn mode on itversity cluster ?

Job aborted due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent failure: Lost task 4.3 in stage 0.0 (TID 25, org.apache.spark.SparkException:
Error from python worker:
/usr/bin/python: No module named pyspark

Note: I am able to launch pyspark shell in yarn mode I am facing this issue only while running the pyspark script.

Command used to run pyspark script:
/usr/hdp/current/spark2-client/bin/spark-submit /home/pavan_na/

Below is the contents of script:

#!/usr/bin/env pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder
.config(“spark.executor.instances”, “14”)
data = spark.sparkContext.parallelize([1,2,3,4,5,6,7])
trans1 = x : x/1)
trans2 = x : x+1)
1,22 Top


@itversity It would be great if you can check. Is there any issue with yarn/spark setup on the cluster ? I am using spark 2.


It is working fine for me in Itversity Cluster. Here are command that i used:

Launch pyspark shell using below command:
/usr/hdp/current/spark2-client/bin/pyspark --conf spark.ui.port=22325 --master yarn-client

And then you can the commands that you used.


@vinodnerella Thanks a lot for sharing your feedback. As mentioned in my initial question I am able to launch pyspark shell in yarn mode and I am able to execute those bunch of commands individually. But when I put the same commands in a pyspark script and try to execute using spark-submit then I get the mentioned error. Can you please try running pyspark script in yarn mode ? Thanks.

Pavan A


hi Pavan,

Please see the below post. it may be helpful!! I am also beginner in python and pySpark.