The below video URL durga gave example for aggrateByKey.
He used normal RDD and heard rdd have performance issue since rdd cant have the schema. if we achieve through domain obj that will lead to heap memory and GC issue. That is reason people going for spark sql.
In real time people are using the RDD for their use case?
I read one of the blog like don’t go for rdd function while using data frame(if data frame not providing the inbuilt function for your usecase) , better go for spark udf.