Difference between RANK, DENSE_RANK and ROW_NUMBER

#1

Let us understand the difference between rank, dense_rank and row_number.

  • We can either of the functions to generate ranks when the rank field does not have duplicates.
  • When rank field have duplicates then row_number should not be used as it generate unique number for each record with in the partition.
  • rank will skip the ranks in between if multiple people get the same rank while dense_rank continue with the next number.

There won’t be any difference when sorted field have unique values.

USE training_retail;

SELECT t.*,
  rank() OVER (
    PARTITION BY order_date ORDER BY revenue DESC) AS rnk,
  dense_rank() row_number() OVER (
    PARTITION BY order_date ORDER BY revenue DESC) AS drnk,
  row_number() OVER (
    PARTITION BY order_date ORDER BY revenue DESC) AS rn
FROM daily_product_revenue t
ORDER BY order_date, revenue DESC
LIMIT 100;

We should not use row_number for ranking when there are redundant values in sorted field.

USE training_hr;

SELECT
  employee_id,
  department_id,
  salary,
  rank() OVER (
    PARTITION BY department_id ORDER BY salary DESC) rnk,
  dense_rank() OVER (
    PARTITION BY department_id ORDER BY salary DESC) drnk,
  row_number() OVER (
    PARTITION BY department_id ORDER BY salary DESC) rn
FROM employees
ORDER BY department_id, salary DESC;

Practice hive on state of the art Big Data cluster - https://labs.itversity.com
You can sign up for our courses on Udemy using $10 coupons - Udemy Coupons - Big Data Courses


0 Likes