Creating DataFrame

apache-spark

#1

Can we directly use rdd.toDF() on a RDD for creating a dataframe or do we need to create case classes and then create a Dataframe.Which one is preferred and why?

Adding Both Code Templates for reference:

val ordersRDD = sc.textFile("/public/retail_db/orders")
val ordersDF = ordersRDD.map(order => {
(order.split(",")(0).toInt, order.split(",")(1), order.split(",")(2).toInt, order.split(",")(3))
}).toDF(“order_id”, “order_date”, “order_customer_id”, “order_status”)

case class Orders(orderId: Int, orderDate: String, orderCustomerId: Int, orderStatus: String)

val orders = sc.textFile("/public/retail_db/orders").
map(rec => rec.split(",")).
map(o => Orders(o(0).toInt, o(1), o(2).toInt, o(3))).toDF()