Spark-shell not finding files on LFS

apache-spark

#1

Hi ,
I am trying to create a RDD by data i saved on LFS (/home/sachinji84/EMP1.csv)

I did

Spark-shell --master local

val a1 = sc.textFile("/home/sachinji84/EMP1.csv")

a1.collect()
its giving me below error

Input path does not exist: hdfs://nn01.itversity.com:8020/home/sachinji84/EMP1.csv

why it is searching on hdfs when dataset is on LFS ?


#2

Hi @sachin_K,

Give the path like below.

val a1 = sc.textFile(“file:///home/sachinji84/EMP1.csv”)


#3

@Sravan_Kumar
Thanks … it worked
But i could not understand it… can you please explain a bit ?


#4

Hi @sachin_K,
Spark supports loading files from local file system,then it requires that the files are available at the same path on all nodes in cluster.


#5

hi @Sravan_Kumar,
As you solved my issue so I thought to check with you only.
I have few question on HDPCD spark certification environment .

  1. in Labs.ITversity showed sc.parallelize to create Rdd from local file system.
    but while certification is is the best to way to create RDD or better to copy data to HDFS and simply use sc,testFile ()

#6

@sachin_K,

You can use either of them.