Spark -submit failing with Input path does not exist

Hi,

I am getting input path does not exist error and following are the error and sample.py file. Could you please let me know what is wrong witht the path.

I have data in the directory: /user/rajeshv28/sqoop_import/departments

[rajeshv28@gw01 ~]$ spark-submit --master yarn --conf “spark.ui.port=10101” sample.py
Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
Traceback (most recent call last):
File “/home/rajeshv28/sample.py”, line 6, in
for line in dataRDD.collect():
File “/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/rdd.py”, line 771, in collect
File “/usr/hdp/2.5.0.0-1245/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py”, line 813, in call
File “/usr/hdp/2.5.0.0-1245/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py”, line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nn01.itversity.com:8020/user/rajeshv28/sqoop_import/departments
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)

Here is my sample.py file:

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName(“pyspark”)
sc = SparkContext(conf=conf)
dataRDD = sc.textFile("/user/rajeshv28/sqoop_import/departments")
for line in dataRDD.collect():
print(line)
dataRDD.saveAsTextFile("/user/rajeshv28/pyspark/departmentsTesting")

use this

print dataRDD.collect()

instead of
for line in dataRDD.collect():
print(line)

Abhi284,

Thank you for your response and I changed it as per your suggestion and i am getting same error. Looks like its not recognizing the input file path.

Thanks,
Rajesh