Dense Rank functionality

Why have we used flapMap in order to call the GetTopDenseN function?

roductsMap.
groupByKey().
flatMap(x => getTopDenseN(x, 2)).
collect().
foreach(println)

Because, data has to be returned as individual records.

Output of groupByKey will be key and iterable of values. If we do not use flatMap, we cannot get top N records as individual records. You need to use flatMap to flatten out the array of n size into n individual records.

1 Like

@Varun_Joshi,

You can use, ‘map()’ instead of ‘flatMap()’ but records will be returned in iterable, you can try as below:

python:::
def topDenseN(rec,topN):
x = []
prodPrices = []
topNPrices = []
sortedRecs = []
for i in rec[1]:
prodPrices.append(float(i.split(",")[4]))
import itertools
topNPrices = list(itertools.islice(sorted(set(prodPrices), reverse=True), 0, topN))
sortedRecs = list(sorted(rec[1], key = lambda k: float(k.split(",")[4]), reverse=True))
for j in sortedRecs:
if(float(j.split(",")[4]) in topNPrices):
x.append(j)
return x