Rdd.take() not working

#1

Hi,

rdd.take() is not working.
However, the same code snippet is working with other connection on the same cluster.
Below is the command and the error:

spark-shell --master yarn --conf spark.ui.port=12345

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

val conf= new SparkConf().setMaster("yarn-client").setAppName("retail data for display")
val sc= new SparkContext(conf)

val rddRetailData= sc.textFile("/public/retail_db/orders/part-00000")

rddRetailData.first()

error:

Please help.

Thanks and Regards,
Sabby

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster

0 Likes

#2

There should have been temporary issue.

Also once you launch spark-shell, you need to first stop the spark context before creating programmatically.

0 Likes

#3

Hi

Thanks a lot…
Putting sc.stop before updating the conf solved the issue.
Really appreciate the timely response.

Thanks and Regards,
Sabby

0 Likes

#4

Hi,

I have another problem.
Whenever i am filtering data from a RDD from spark-shell, I am getting classnotfound exception.
I have restarted my pc, altered the executor numbers, tried after some more time, but nothing seems to work.

given below is my code:
spark-shell --master yarn --conf spark.ui.port=12335

//sc.stop

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

val conf= new SparkConf().setMaster("yarn-client").setAppName("first filtered data")
val sc= new SparkContext(conf)

val ordersRDD= sc.textFile("/public/retail_db/orders/part-00000")

//filter the RDD
val filteredOrders= ordersRDD.filter(order=> order.split(",")(3) == "COMPLETE")


filteredOrders.take(10).foreach(println)

i tried to add sc.stop, but that also didnt work.

I also found an exactly same kind of issue in the forum. Given below is the link to that issue:

I have been trying this since yesterday and spent almost half a day behind it.
Please help me out.

Thanks and Regards,
Sabby

0 Likes

#5

You are running these things in our cluster and hence restarting your PC does not make any sense :slight_smile:

Problem is you are using spark-shell in an improper way. You should not create conf and sc like that. You have to use IDE and develop the code, then use spark-submit to submit the application.

If I run below code directly without stopping and recreating the context, it is working fine.

0 Likes