Unable to create dataframe using local file

productsDF = spark.read.csv(’/data/retail_db/products’)
Traceback (most recent call last):
File “”, line 1, in
File “/usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py”, line 441, in csv
return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py”, line 1160, in call
File “/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py”, line 69, in deco
raise AnalysisException(s.split(’: ‘, 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u’Path does not exist: hdfs://nn01.itversity.com:8020/data/retail_db/products;’

Hold on, read this through before raising topic in this category

Are you getting Permission denied, too many logins issue?
Don’t raise new ticket. Click here for the solution. If the issue persists after 30 minutes then raise new ticket

Go through other common issues in this category before raising any issue.

Could you please reply with the command you have tried to launch this Pyspark shell so that we can help you in a better way.

export SPARK_MAJOR_VERSION=2
pyspark2 --master yarn --conf spark.ui.port=12901

ordersDF = spark.read.csv(’/data/retail_db/orders’)
Traceback (most recent call last):
File “”, line 1, in
File “/usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py”, line 441, in csv
return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py”, line 1160, in call
File “/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py”, line 69, in deco
raise AnalysisException(s.split(’: ‘, 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u’Path does not exist: hdfs://nn01.itversity.com:8020/data/retail_db/orders;’

This is not a lab issue. As you are running in yarn mode, you have to give hdfs path itself.
Please refer the link below to get hdfs path for data sets: