Let us see how we can perform aggregations with in a key using Windowing/Analytics Functions.
- For simple aggregations where we have to get grouping key and aggregateed results we can use GROUP BY.
- If we want to get the raw data along with aggregated results, then using GROUP BY is not possible or overly complicated.
- Instead we can use aggregate functions with OVER Clause.
- Let us take an example of getting employee salary percentage when compared to department salary expense.
Let us write the query using GROUP BY approach. This query will not work in Hive, but might work in traditional Databases.
SELECT department_id, sum(salary) AS department_salary_expense FROM employees GROUP BY department_id; SELECT e.employee_id, e.department_id, e.salary, ae.department_salary_expense FROM employees e JOIN ( SELECT department_id, sum(salary) AS department_salary_expense FROM employees GROUP BY department_id ) ae ON e.department_id = ae.department_id;
Let us see how we can get it using Analytics/Windowing Functions.
SELECT e.employee_id, e.department_id, e.salary, sum(e.salary) OVER (PARTITION BY e.department_id) AS department_salary_expense FROM employees e ORDER BY e.department_id;