Spark - DF - registerTempTable & printScehema not working


#1

Both registerTempTable & printScehema showing error in lab…Please help me in this regard…i even tried registerAsTable.

scala> ordersDF.printschema()
:32: error: value printschema is not a member of Unit
ordersDF.printschema()

ordersDF.registerTempTable(“orders”)
:32: error: value registerTempTable is not a member of Unit
ordersDF.registerTempTable(“orders”)

ordersDF.registerAsTable
:32: error: value registerAsTable is not a member of Unit
ordersDF.registerAsTable


#2

can you please paste your entire query.


#3

@Fayaz

Ex:
val dF = sqlContext.sql(“select * from employeeTable”).collect()

Remove collect() function from your statement

val dF = sqlContext.sql(“select * from employeeTable”)
Now it should work
dF.show
dF.count
dF.printSchema


#4

@nikhil23 @venkateshm thanks for the replies bro, kindly see my below code that I tried and it did not work as expected…whether I suppose to use anyother api instead of registerTempTable() …Please help since I am not able to proceed further…Thank you…

val ordersRDD = sc.textFile("/public/retail_db/orders")
val ordersDF = ordersRDD.
  map(order => {
    (order.split(",")(0).toInt, order.split(",")(1), order.split(",")(2).toInt, order.split(",")(3))
  }).
  toDF("order_id", "order_date", "order_customer_id", "order_status")
 
ordersDF.registerTempTable("orders")
sqlContext.sql("select order_status, count(1) count_by_status from orders group by order_status").show()

#5

A DataFrame is a distributed collection of structured data organized into named columns.
We need schema to work with DF.
First you have to create case class and map it to your RDD.

case class Orders(order_id: Int, order_date: String, order_customer_id: Int, order_status: String)

val ordersRDD = sc.textFile("/public/retail_db/orders")
val ordersDF = ordersRDD.map(order => {
Orders(order.split(",")(0).toInt, order.split(",")(1), order.split(",")(2).toInt, order.split(",")(3))
}).toDF()

ordersDF.registerTempTable(“orders”)
sqlContext.sql(“select order_status, count(1) count_by_status from orders group by order_status”).show()

Results

scala> case class Orders(order_id: Int, order_date: String, order_customer_id: Int, order_status: String)
defined class Orders

scala> val ordersRDD = sc.textFile("/public/retail_db/orders")
ordersRDD: org.apache.spark.rdd.RDD[String] = /public/retail_db/orders MapPartitionsRDD[1] at textFile at :24

scala> :paste
// Entering paste mode (ctrl-D to finish)

val ordersDF = ordersRDD.map(order => {
Orders(order.split(",")(0).toInt, order.split(",")(1), order.split(",")(2).toInt, order.split(",")(3))
}).toDF()

// Exiting paste mode, now interpreting.

ordersDF: org.apache.spark.sql.DataFrame = [order_id: int, order_date: string … 2 more fields]

scala> ordersDF
res1: org.apache.spark.sql.DataFrame = [order_id: int, order_date: string … 2 more fields]

scala> ordersDF.printSchema
root
|-- order_id: integer (nullable = false)
|-- order_date: string (nullable = true)
|-- order_customer_id: integer (nullable = false)
|-- order_status: string (nullable = true)

// To register as table (Note Spark 1 -> ordersDF.registerTempTable(“orders”))
scala> ordersDF.createOrReplaceTempView(“orders”)

// spark 1: sqlContext.sql(“select order_status, count(1) count_by_status from orders group by order_status”).show()

scala> spark.sqlContext.sql(“select order_status,count(1) from orders group by order_status”).show()
±--------------±-------+
| order_status|count(1)|
±--------------±-------+
|PENDING_PAYMENT| 15030|
| COMPLETE| 22899|
| ON_HOLD| 3798|
| PAYMENT_REVIEW| 729|
| PROCESSING| 8275|
| CLOSED| 7556|
|SUSPECTED_FRAUD| 1558|
| PENDING| 7610|
| CANCELED| 1428|
±--------------±-------+


#6

Hi Fayaz,

There seems to be to be a typo (double quotes are not proper) in the lines toDF(“order_id”, “order_date”, “order_customer_id”, “order_status”) , ordersDF.registerTempTable(“orders”) and sqlContext.sql(“select order_status, count(1) count_by_status from orders group by order_status”).show() .Use proper double quotes.You should be able to create Temp Table and then run queries against it

Code below:

val ordersDF = ordersRDD.map(order => {
| (order.split(",")(0).toInt, order.split(",")(1), order.split(",")(2).toInt, order.split(",")(3))
| }).toDF(“order_id”, “order_date”, “order_customer_id”, “order_status”)

ordersDF.registerTempTable(“orders”)
sqlContext.sql(“select order_status, count(1) count_by_status from orders group by order_status”).show()


#7

Your code is running fine for me. Can you elaborate what is the issue you are facing with screenshots?


#8

error2

@dgadiraju
nothing worked for me…Below is my code and i have attached my snapshot of my error.

val orders = sc.textFile("/public/retail_db/orders")

case class Orders(order_id:Int,order_date:String,order_product_id:Int,order_status:String)

val ordersDF = orders.
map(rec => Orders(rec.split(",")(0).toInt,rec.split(",")(1),rec.split(",")(2).toInt,rec.split(",")(3))).toDF().show()

ordersDF.registerTempTable(“orders”)

ordersDF.printSchema()


#9

finally it worked me guys…

sofar i have been using as below :

val ordersDF = orders.map(order => {
(order.split(",")(0).toInt, order.split(",")(1), order.split(",")(2).toInt, order.split(",")(3))
}).toDF(“order_id”, “order_date”, “order_customer_id”, “order_status”).show()

but if i remove the show() and use it separetly like below…all the things worked for me…is it strange or expected behavior ?

val ordersDF = orders.map(order => {
(order.split(",")(0).toInt, order.split(",")(1), order.split(",")(2).toInt, order.split(",")(3))
}).toDF(“order_id”, “order_date”, “order_customer_id”, “order_status”)

ordersDF.show()
ordersDF.printScheme()
ordersDF.registerTempTable(“orders”)
sqlContext.sql(“select * from orders limit 5”).show()


#10

big thanks to everyone who spent some time and helped me :slight_smile:


#11

show does not return data frame, it returns nothing/Unit. It’s behavior is to just display first 20 records or what ever number is passed.

You have not used properly. As part of the code you have pasted which was tested by me does not have show. So instead of pasting which was not working for you, you just pasted my code :slight_smile: