BigDataLabs -- Unable to read local text file

apache-spark
scala

#1

Hello @itversity,

I am unable to read data from my local text file.

I have created one text file under my home directory and trying to read with below 2 commands but unable to read/ create RDD.
Local directory location: /home/dkothari/

commands used:

  1. first I tried below commands:
    val data = scala.io.Source.fromFile("/home/dkothari/emp.txt").getLines.toList
    val dataRDD = sc.parallelize(data)
    dataRDD.first <-- command to read data
  2. second, I tried below commands
    val data1 = sc.textFile(“file:///home/dkothari/emp.txt”)
    data1.first <-- command to read data

Below is the error message: (this is just start of error message, let me know if you need full error message)

18/03/09 06:44:05 INFO FileInputFormat: Total input paths to process : 1
18/03/09 06:44:05 INFO SparkContext: Starting job: first at :30
18/03/09 06:44:05 INFO DAGScheduler: Got job 31 (first at :30) with 1 output partitions
18/03/09 06:44:05 INFO DAGScheduler: Final stage: ResultStage 37 (first at :30)
18/03/09 06:44:05 INFO DAGScheduler: Parents of final stage: List()
18/03/09 06:44:05 INFO DAGScheduler: Missing parents: List()
18/03/09 06:44:05 INFO DAGScheduler: Submitting ResultStage 37 (file:///home/dkothari/emp.txt MapPartitionsRDD[54] at textFile at :27), which has no missing parents
18/03/09 06:44:05 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:150)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:150)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:150)
at org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:173)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)


#2

Dear Divyakot,

Please let us know how got scala prompt.

a) spark-shell --master yarn --conf spark.ui.port=11111
OR
b)spark-shell --conf spark.ui.port=11111

To access LFS we need open spark in local mode.

Can you please check and confirm.

Raj.


#3

Hello Raj,

I launched spark shell with below command, means in yarn mode.
spark-shell --master yarn --conf spark.ui.port=11111

Question: Do we need to do the same in cloudera cluster as well, at the time of certification?

Thanks