Create case class in Scala from the DataFrame

apache-spark
scala

#1

I have hive table with the following fields
id name age gender level salary

and retrieved data in spark (scala) from the hive and created new data frame with the following
id name gender level salary
then created new hive table from spark.

Now, the question here is how to create case class for the new dataframe performing some operations like if level is > 1 give salary hike of 3000.

After performing the operations, then create the table in hive.


#2

There is no need to create case class in this case.

You just have to use filter on top of DataFrame and use saveAsTable API which is part of df.write to create fresh hive table. If you want to insert into existing table you have to say insertInto which is again part of df.write.

Here is an example to create new Hive table in hive database itversity for completed orders:

val orders = sqlContext.read.json("/public/retail_db_json/orders")
val completedOrders = orders.filter($"order_status" === "COMPLETE")
completedOrders.write.saveAsTable("itversity.completed_orders")

Code is tested on our state of the art Big Data cluster using 1.6.2

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster