What is ByKey in spark?

apache-spark
rdd-api
#1

What is ByKey in spark? What is RDD.ByKey(…)? ByKey is a general term for a group of transformations/action?
Then, can you please list examples of it.
OR
ByKey is specific transformation/action?

0 Likes

#2

Hi @praveen,

There are many transformations and actions of ByKey() in spark-scala and pyspark. Your question requires a lot of explanation, and it is trivial that you haven’t gone through itversity videos of Spark. Below is the superb reference page to learn everything about ByKey(). All the best!!!

Aggregating data sets using pyspark – by key

0 Likes

#3

@praveen Yes, it is general term used for representing group of transformations/actions. Some of them are groupByKey, aggregateByKey, reduceByKey etc.

Generally, you will see these transformations/actions performed on PairedRDD so called PairRDDFunctions.

You can find full list of ByKey functions here.

Hope this helps! :slight_smile:

1 Like