Sorting and Ranking - sortByKey and groupByKey

Originally published at:

Sorting is typically done by sortByKey and more complex sorting as well as ranking is typically done by groupByKey To sort the data by composite keys, we need to bring all the elements to key Data can be sorted in ascending or descending order based on all the keys If we have to sort the…

How to get the Top 5 products by revenue for each day

Here is the design

  • Join orders and order_items
  • Join products and previous RDD
  • Get order_date, product_id/product_name as key and order_item_subtotal as value
  • Aggregate for each order_date and product
  • Get order_date as key with product name and revenue as value
  • Group by order_date and the value will be Iterable of product and revenue
  • Sort the Iterable in descending order by revenue and get top 5 products