RDD and Collection


#1

In the code below, we are passing the RDD productsGroupByCategoryId to the function: getTopNPricedProductsPerCategoryId as the parameter: productsPerCategoryId.

So, how are we able to sort this RDD with indexing [1] as when I try to execute the code without defining the function:getTopNPricedProductsPerCategoryId, it gives me error saying that RDD does NOT allow indexing.


def getTopNPricedProductsPerCategoryId(productsPerCategoryId, topN):
productsSorted = sorted(productsPerCategoryId[1],
key=lambda k: float(k.split(",")[4]),
reverse=True
)
productPrices = map(lambda p: float(p.split(",")[4]), productsSorted)
topNPrices = sorted(set(productPrices), reverse=True)[:topN]
import itertools as it
return it.takewhile(lambda p:
float(p.split(",")[4]) in topNPrices,
productsSorted
)

list(getTopNPricedProductsPerCategoryId(t, 3))

topNPricedProducts = productsGroupByCategoryId.
flatMap(lambda p: getTopNPricedProductsPerCategoryId(p, 3))