@Tarun_Das : I think this is one of the exercise in hadooppass.com, Based on the probm statement they have given, We need to produce a List of all names based on ID.
For that we can ignore the swap function mentioned as part of solution.That is a mistake I guess.
Regarding the CombineByKey():
valcombinedOutput = namePairRDD.combineByKey(List(_),(x:List[String],y:String) => y::x,(x:List[String],y:List[String] => x:::y))
List(_) -> this creates an empty list when ever a new key is encountered while processing the data from namePairRDD. means Two lists will be created for the input given above. one list for 1 and another for 2.
(x:List[String],y:String) => y::x -> this is the combiner.accumulator is a list of String variables. x:List[String] , is the sscala way of defining a list of strings.here x is a list of strings. y is a String variable . y::x (note the two colons)means appending y to the List x. so, for evey key we are appending value to the list.
(x:List[String],y:List[String])=>x:::y) ->This is kind of redcuer, which applies on all the intermediate combiners produced.Here x is a list of Strings ,So is y.which got produced in previous step.x:::y(note the three colons) is scala way of appending two lists.So, we are combining all the lists produced for a key and making it into a single list.
@itversity : Please correct me If I am wrong.