Error while using toDF() function to convert RDD to DataFrame

Hi there,
Could you please help me out to resolve this issue? while using the toDF() operation for conversion of RDD to dataFrame using SCALA IDE… its showing below error :

val orderdd = sc.textFile("/home/cloudera/Documents/retail_db/orders.txt")

val orderD= orderdd.map(rec => { val a = rec.split(",")
orders(a(0).toInt,a(1),a(2).toInt,a(3))})

val orderDataFrame = orderD.toDF()
value toDF is not a member of **
** org.apache.spark.rdd.RDD[orders]

**

While in Scala CLI …its working . I have already coded import statements for SQL and sparkContext .

Thank you so much!

You have to paste the complete code along with contents from build.sbt or pom.xml

Hi Manisha,

I think you need to add these lines, before you create ‘orderdd’. Also, you need to have spark-sql library added to your pom.xml or build.sbt.

val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

Thanks,
Ashok

Hi Ashok,

Many Thanks for your quick response.

I already added these lines in my code and added the Spark SQL dependency in build.sbt and pom.xml.
below is code.

case class orders(orders_id: Int,orders_date: String, orders_cust_id: Int, orders_Status: String);

case class order_item(order_item_id:Int , order_item_order_id: Int, order_item_prod_id:Int,
order_item_quantity:Int, order_item_subtotal:Float, order_item_price: Float)

val conf = new SparkConf().setAppName(“AvgRev_df”).setMaster(“local”);
val sc = new SparkContext(conf);

val sqlcontext = new SQLContext(sc);

import sqlcontext.implicits._;

val orderdd = sc.textFile("/home/cloudera/Documents/retail_db/orders.txt")

val orderDF = orderdd.map(rec => { val a = rec.split(",")
orders(a(0).toInt,a(1),a(2).toInt,a(3))})

val orderDataFrame = orderDF.toDF()

value toDF is not a member of
org.apache.spark.rdd.RDD[orders]

Hi there,

below is my code:

Build.sbt:
name := "scala-spark-app"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += “org.apache.spark” % “spark-core_2.11” % "1.6.2"
libraryDependencies += “org.apache.spark” % “spark-sql_2.11” % "1.6.2"
libraryDependencies += “com.typesafe” % “config” % “1.2.1”


val conf = new SparkConf().setAppName(“AvgRev_df”).setMaster(“local”);
val sc = new SparkContext(conf);

val sqlcontext = new SQLContext(sc);

import sqlcontext.implicits._;
val orderdd = sc.textFile("/home/cloudera/Documents/retail_db/orders.txt")

val orderDF = orderdd.map(rec => { val a = rec.split(",")
orders(a(0).toInt,a(1),a(2).toInt,a(3))})

val orderDataFrame = orderDF.toDF()

case class orders(orders_id: Int,orders_date: String, orders_cust_id: Int, orders_Status: String);

case class order_item(order_item_id:Int , order_item_order_id: Int, order_item_prod_id:Int,
order_item_quantity:Int, order_item_subtotal:Float, order_item_price: Float)

@ManishaSaxena : There could be only for reason, you might have defined the case classes in the same method where you are are converting it to df. Try to create case classes outside of the main method and check.
Please more on this link https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache-spark/td-p/29878

1- Import implicits:
Note that this should be done only after an instance of org.apache.spark.sql.SQLContext is created. It should be written as:
val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

2- Move case class outside of the method:
case class, by use of which you define the schema of the DataFrame, should be defined outside of the method needing it. You can read more about it here:
https://issues.scala-lang.org/browse/SI-6649

Hi Ashok,

Thank you so much!

Now the Error is gone. you are right …It was my mistake , i defined case class within main method.

1 Like

@ManishaSaxena cool that you got it right! :grinning: