Unable to view RDD created using manager.collect().foreach(println)

created file by touch EmployeeManager.csv and copied below at local

EmployeeManager.csv
E01,Vishnu
E02,Satyam
E03,Shiv
E04,Sundar
E05,John
E06,Pallavi
E07,Tanvir
E08,Shekhar
E09,Vinod
E10,Jitendra

then copied to hdfs using hdfs dfs -put /home/tarunkumard/hadoopexam/file1.txt hdfs_commands

then started spark-shell and executed below

val manager = sc.textFile("/user/tarunkumard/sparkl/EmployeeManager.csv")

but when executed manager.collect().foreach(println)

got below error at spark consolle

scala> managerPairRDD.collect().foreach(println)
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nn01.it
versity.com:8020/user/tarunkumard/sparkl/EmployeeManager.csv
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFor
mat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1953)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:934)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
at org.apache.spark.rdd.RDD.collect(RDD.scala:933)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:32)

I verified EmployeeManager.csv exists at /user/tarunkumard/spark1

@Tarun_Das - As per below reference you are mentioning sparkI, dataset present in spark1?

val manager = sc.textFile("/user/tarunkumard/sparkl/EmployeeManager.csv")

yes dataset present in spark1 diectory /user/tarunkumard/spark1/EmployeeManager.csv

spark1 must be wrongly copied as sparkl from console to here

Closing this post as solved,it was my mistake in name reference

1 Like