HDFS path does not exist with SparkSession object when spark master is set as LOCAL

I am trying to load a dataset into Hive table using Spark.
But when I try to load the file from HDFS directory to Spark, I get the exception:

org.apache.spark.sql.AnalysisException: Path does not exist: file:/home/cloudera/partfile;

These were the steps before loading the file.

val wareHouseLocation = "file:${system:user.dir}/spark-warehouse"
val SparkSession = SparkSession.builder.master("spark://localhost:7077") \
    .appName("SparkHive") \
    .enableHiveSupport() \
    .config("hive.exec.dynamic.partition", "true") \
    .config("hive.exec.dynamic.partition.mode","nonstrict") \
    .config("hive.metastore.warehouse.dir","/user/hive/warehouse") \
    .config("spark.sql.warehouse.dir",wareHouseLocation).getOrCreate()
import sparkSession.implicits._
val partf = sparkSession.read.textFile("partfile")

Exception for the statement ->

val partf = sparkSession.read.textFile("partfile")
org.apache.spark.sql.AnalysisException: Path does not exist: file:/home/cloudera/partfile;

But I have the file in my home directory of HDFS.

hadoop fs -ls
Found 1 items
-rw-r--r--   1 cloudera cloudera         58 2017-06-30 02:23 partfile

My spark version is 2.0.2
Could anyone tell me how to fix it ?

@bobbysidhartha the path should have ‘file :///’, try adding ‘//’

val wareHouseLocation = "file://${system:user.dir}/spark-warehouse"

@ashok_singamaneni
Tried that and resulted in same error. Is there anything I need to correct ?

@bobbysidhartha
I was looking at different thing before.
So, it is looking for local directory (LFS) - /home/cloudera/
but you are expecting it to take from hdfs directory - /user/cloudera/

please try this

val partf = sparkSession.read.textFile("hdfs:///partfile")

@ashok_singamaneni

I tried it like this:
val partfile = spark.read.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/partfile")
and getting the exception:

java.io.IOException: Incomplete HDFS URI, no host: hdfs://quickstart.cloudera:8020/user/cloudera/partfile

I also tried this:

val partfile = spark.read.textFile("hdfs:///partfile")

That also resulted in the same exception. Is there any other way I can fix it ?

@bobbysidhartha try the below
hdfs://quickstart:8020/user/cloudera/partfile

reference: https://stackoverflow.com/questions/37056897/exception-in-thread-main-java-io-ioexception-incomplete-hdfs-uri-no-host-hd

@ashok_singamaneni
I referred it just now and getting the exception:
Caused by: org.apache.derby.iapi.error.StandardException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$‌​1@6ba6ec73, see the next exception for details. at org.apache.derby.iapi.error.StandardException.newException(U‌​nknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTr‌​ansportAcrossDRDA(Un‌​known Source) ... 144 more Caused by: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database /home/cloudera/metastore_db.

I changed it to:
val partfile = spark.read.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/partfile")

I am getting the new exception because of the above line. I copied the hive-site.xml to Spark/conf dir, but the issue still exists.

@bobbysidhartha
Remove “.cloudera” from “quickstart.cloudera:8020” just try using “hdfs://quickstart:8020/user/cloudera/partfile”

I tried it just now. Getting the exception:

java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx-----

The exceptions are getting weirder and weirder.

@bobbysidhartha
I know it is frustrating, but these are configuration issues.

please try this on your VM
hdfs dfs -chmod 777 /tmp/hive

I tried your command: chmod:hdfs dfs -chmod 777 /tmp/hive
I am getting permission error. I even tried it with SUDO option. Still getting the same below message.
changing permissions of '/tmp/hive': Permission denied. user=cloudera is not the owner of inode=hive

Also if I run the statement:
val partfile = spark.read.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/partfile") for the second time, I get
the exception:

aused by: org.apache.derby.iapi.error.StandardException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@5218511, see the next exception for details.
  at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
  at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
  ... 142 more
Caused by: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database /home/cloudera/metastore_db.

I tried to check the background processes, if there are any, and I could see only these options

cloudera  9622  9607  0 06:45 pts/1    00:00:00 bash /usr/lib/spark/bin/spark-shell
cloudera  9626  9622  1 06:45 pts/1    00:01:04 /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Dscala.usejavacp=true -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name Spark shell spark-shell
cloudera 11838  9402  0 07:57 pts/0    00:00:00 grep spark-shell

Anything else I can do here ?

@itversity Can you please help, I do see any other options as I don’t have VM with me.

@bobbysidhartha Can you please do “ls -ltr” on /tmp folder to see the permissions and users on /tmp/hive folder.
if you can delete /tmp/hive also, this will work

@ashok_singamaneni There is no /tmp/hive/ to delete. Please check the image below.