Dealing with Collection objects at the reduce side

apache-spark
rdd-api
#1

Hi Everyone,
How can we deal with collections at the reduce side.
While doing the wordcount program we used reducebykey to get the count of each word. Here we are adding 2 integer values(acc,values) => (acc+value).

Now suppose what if we had to add(add each element of 2 arrays present at the same index and not appending) 2 arrays.
Is there anyway we can achive this during reduce by key.

EX:Input
(id1,WrappedArray(0, 1, 0, 0))
(id2,WrappedArray(0, 0, 1, 0))
(id3,WrappedArray(0, 5, 0, 0))
(id2,WrappedArray(0, 0, 2, 0))

Expected output:
(id1,WrappedArray(0, 1, 0, 0))
(id2,WrappedArray(0, 0, 3, 0))
(id3,WrappedArray(0, 5, 0, 0))

0 Likes

#2

Theoretically it is possible.

0 Likes