Error while reading a textfile in Spark?

Command : val myFile = sc.textFile("/home/user/shyamlesh/wordcount.txt")

scala> val myFile= sc.textFile("/home/user/shyamlesh/wordcount.txt")
myFile: org.apache.spark.rdd.RDD[String] = /home/user/shyamlesh/wordcount.txt MapPartitionsRDD[1] at textFile at :27

scala> myFile.count()
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nn01.itversity.com:8020/home/user/shyamlesh/wordcount.txt
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1953)
at org.apache.spark.rdd.RDD.count(RDD.scala:1164)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:37)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at $iwC$$iwC$$iwC$$iwC.(:41)
at $iwC$$iwC$$iwC.(:43)
at $iwC$$iwC.(:45)
at $iwC.(:47)
at (:49)
at .(:53)
at .()
at .(:7)

@shyamlesh -
Gatewaynode/local path for your user - /home/shyamlesh
HDFS path for your user - /user/shyamlesh

val myFile = sc.textFile("/home/shyamlesh/wordcount.txt")

1 Like

Local path did not work hence tried with HDFS path and it worked :smiley:

scala> val m1 = sc.textFile("/user/shyamlesh/deckofcards.txt")
m1: org.apache.spark.rdd.RDD[String] = /user/shyamlesh/deckofcards.txt MapPartitionsRDD[7] at textFile at :27

scala> m1.count()
res3: Long = 52

Many Thanks to you :slight_smile:

For local path, use the format:
file:///home/username

For HDFS path, use the format:
/user/username (or) hdfs://nn01.itversity.com:8020/user/username

1 Like