Hortonworks Spark does not read from HDFS when pyspark started with yarn

In the Hortonworks Spark developer exam, I was having a puzzling problem. If I start pyspark with yarn it does not read the data in HDFS. The one below says directory does not exist.

pyspark --master yarn
rdd =sc.textFile(“HDFS path”)

But later after wasting many minutes, I found out the below code.
rdd =sc.textFile(“HDFS path”)

The one above works fine. Why is that it does not work with yarn?


Which version of Spark is being launched when you’re launching Pyspark with YARN?
You can get this information from shell verbose logs or


Might be that specific version of Spark is not configured on YARN in itversity labs.

.HDP 2.4.0
• Spark 1.6
• Scala 2.10.5
• Python 2.7.6 (pyspark)