Input path does not exist: hdfs://nn01.itversity.com:8020

apache-spark

#1

orders.take(1)
Traceback (most recent call last):
File “”, line 1, in
File “/usr/hdp/2.5.0.0-1245/spark/python/pyspark/rdd.py”, line 1267, in take
totalParts = self.getNumPartitions()
File “/usr/hdp/2.5.0.0-1245/spark/python/pyspark/rdd.py”, line 356, in getNumPartitions
return self._jrdd.partitions().size()
File “/usr/hdp/2.5.0.0-1245/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py”, line 813, in call
File “/usr/hdp/2.5.0.0-1245/spark/python/pyspark/sql/utils.py”, line 45, in deco
return f(*a, **kw)
File “/usr/hdp/2.5.0.0-1245/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py”, line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o228.partitions.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nn01.itversity.com:8020/data/retail_db/orders/part-00000

Directory location…
[ashoo1234@gw03 retail_db]$ ls -ltr /data/retail_db/orders/part-00000
-rw-r–r-- 1 root root 2999944 Feb 20 2017 /data/retail_db/orders/part-00000


#2

This path is in local system. For cluster environments, we need to give the data which in on HDFS. Please use the below path where data in HDFS.

hadoop fs -ls /public/retail_db/orders/
Found 1 items
-rw-r--r--   3 hdfs hdfs    2999944 2016-12-19 03:52 /public/retail_db/orders/part-00000

#3

Thanks this worked when HDFS path provided.

Could you please also set the debugging level to ERROR instead of INFO.


#4

It is set to INFO,console. For applications, the default root logger is “INFO, console”, which logs all message at level INFO and above to the console’s stderr.