Performing Aggregations using Windowing Functions

#1

Let us see how we can perform aggregations with in a key using Windowing/Analytics Functions.

  • For simple aggregations where we have to get grouping key and aggregateed results we can use GROUP BY.
  • If we want to get the raw data along with aggregated results, then using GROUP BY is not possible or overly complicated.
  • Instead we can use aggregate functions with OVER Clause.
  • Let us take an example of getting employee salary percentage when compared to department salary expense.

Let us write the query using GROUP BY approach. This query will not work in Hive, but might work in traditional Databases.

SELECT department_id,
       sum(salary) AS department_salary_expense
FROM employees
GROUP BY department_id;

SELECT e.employee_id, e.department_id, e.salary,
       ae.department_salary_expense
FROM employees e JOIN (
     SELECT department_id, 
            sum(salary) AS department_salary_expense
     FROM employees
     GROUP BY department_id
) ae
ON e.department_id = ae.department_id;

Let us see how we can get it using Analytics/Windowing Functions.

SELECT e.employee_id, e.department_id, e.salary,
       sum(e.salary) 
         OVER (PARTITION BY e.department_id)
         AS department_salary_expense
FROM employees e
ORDER BY e.department_id;

Practice hive on state of the art Big Data cluster - https://labs.itversity.com
You can sign up for our courses on Udemy using $10 coupons - Udemy Coupons - Big Data Courses


0 Likes