Not able to run pyspark script in yarn mode - "No module named pyspark"

pyspark
apache-spark
Not able to run pyspark script in yarn mode - "No module named pyspark"
5.0 1

#1

Hi All,
I am trying to run pyspark script in yarn mode but I am getting the below error. Has anyone successfully executed pyspark script in yarn mode on itversity cluster ?

Job aborted due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent failure: Lost task 4.3 in stage 0.0 (TID 25, wn03.itversity.com): org.apache.spark.SparkException:
Error from python worker:
/usr/bin/python: No module named pyspark
PYTHONPATH was:
/hdp01/hadoop/yarn/local/filecache/2518/spark2-hdp-yarn-archive.tar.gz/spark-core_2.11-2.0.0.2.5.0.0-1245.jar
java.io.EOFException

Note: I am able to launch pyspark shell in yarn mode I am facing this issue only while running the pyspark script.

Command used to run pyspark script:
/usr/hdp/current/spark2-client/bin/spark-submit /home/pavan_na/pySparkCheckDebug.py

Below is the contents of script:

#!/usr/bin/env pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder
.master(“yarn”)
.appName(“exception_check2”)
.config(“spark.executor.instances”, “14”)
.getOrCreate()
data = spark.sparkContext.parallelize([1,2,3,4,5,6,7])
trans1 = data.map(lambda x : x/1)
trans2 = trans1.map(lambda x : x+1)
trans2.collect()
1,22 Top


#2

@itversity It would be great if you can check. Is there any issue with yarn/spark setup on the cluster ? I am using spark 2.


#3

It is working fine for me in Itversity Cluster. Here are command that i used:

Launch pyspark shell using below command:
/usr/hdp/current/spark2-client/bin/pyspark --conf spark.ui.port=22325 --master yarn-client

And then you can the commands that you used.


#4

@vinodnerella Thanks a lot for sharing your feedback. As mentioned in my initial question I am able to launch pyspark shell in yarn mode and I am able to execute those bunch of commands individually. But when I put the same commands in a pyspark script and try to execute using spark-submit then I get the mentioned error. Can you please try running pyspark script in yarn mode ? Thanks.

Regards,
Pavan A


#5

hi Pavan,

Please see the below post. it may be helpful!! I am also beginner in python and pySpark.