Need explaination of below scala statement

Can anyone please explain of below scala statement,what is being done here

valcombinedOutput = namePairRDD.combineByKey(List(_),(x:List[String],y:String) => y::x,(x:List[String],y:List[String] => x:::y))

The above line is a part of below code snippet

val name = sc.textFile(“spark8/data.csv”)

val namePairRDD => (x.split(",")(0),x.split(",")(1)))

val swapped =

From where you got it? Can you share data fro spark8/data.csv?


@Tarun_Das : I think this is one of the exercise in, Based on the probm statement they have given, We need to produce a List of all names based on ID.

For that we can ignore the swap function mentioned as part of solution.That is a mistake I guess.
Regarding the CombineByKey():

valcombinedOutput = namePairRDD.combineByKey(List(_),(x:List[String],y:String) => y::x,(x:List[String],y:List[String] => x:::y))

List(_) -> this creates an empty list when ever a new key is encountered while processing the data from namePairRDD. means Two lists will be created for the input given above. one list for 1 and another for 2.

(x:List[String],y:String) => y::x -> this is the combiner.accumulator is a list of String variables. x:List[String] , is the sscala way of defining a list of x is a list of strings. y is a String variable . y::x (note the two colons)means appending y to the List x. so, for evey key we are appending value to the list.

(x:List[String],y:List[String])=>x:::y) ->This is kind of redcuer, which applies on all the intermediate combiners produced.Here x is a list of Strings ,So is y.which got produced in previous step.x:::y(note the three colons) is scala way of appending two lists.So, we are combining all the lists produced for a key and making it into a single list.

@itversity : Please correct me If I am wrong.

Ok, the question does not make much of sense. I do not think official exam will have this kind of questions, especially on combineByKey which is not part of official spark documentation.

I am not sure from where these questions came from because we are group of friends and we share questions and challenge each other :slight_smile:

But thanks a lot for the explaination,god bless you


below is the exact question

You have been given a file named spark8/data.csv(type,name)

1.Load this file from hdfs and save it back as (id,(all names of same type)) in results directory.
However,make sure while saving it should be able to write in a single file

How would i be able to achieve this without combinerbykey ?Thanks in advance

