In Real Time people are using core RDD functionality?

#1

Hi All,
The below video URL durga gave example for aggrateByKey.
He used normal RDD and heard rdd have performance issue since rdd cant have the schema. if we achieve through domain obj that will lead to heap memory and GC issue. That is reason people going for spark sql.

Question 1:
In real time people are using the RDD for their use case?

Note
I read one of the blog like don’t go for rdd function while using data frame(if data frame not providing the inbuilt function for your usecase) , better go for spark udf.

Thanks
Suresh Selvaraj

0 Likes

#2

Hi @suresh_selvaraj, RDD is the core abstraction, not many are using it nowadays, since using DataFrame you can gain performance improvements. DataFrame uses dynamic code generation and other techniques outlined in Project Tungsten, that improves performance.

Yes, it is better to write a UDF on a DataFrame, since it gives performance improvements than the bare RDD transformations and actions.

0 Likes