Database Essentials - Writing Advanced SQL Queries - Cumulative or Moving Aggregations

Let us understand how we can take care of cumulative or moving aggregations using Analytic Functions.

  • When it comes to Windowing or Analytic Functions we can also specify window spec using ROWS BETWEEN clause.
  • Even when we do not specify window spec, the default window spec is used. For most of the functions the default window spec is UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. You also have special clauses such as CURRENT ROW.
  • We can leverage ROWS BETWEEN for cumulative aggregations or moving aggregations.
  • Here are some of the examples with respect to ROWS BETWEEN.
  • Here is an example of cumulative sum for sum_sal_expense based on the employees’ department ID and salary.
SELECT e.employee_id, e.department_id, e.salary,
    sum(e.salary) OVER (
        PARTITION BY e.department_id
        ORDER BY e.salary
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS sum_sal_expense
FROM employees e
ORDER BY e.department_id, e.salary DESC
LIMIT 10
  • Make sure to restart the kernel for accurate results if you are using Jupyter environment.

  • Another example is calculating the cumulative daily revenue based on the order date.

SELECT t.*,
    round(sum(t.revenue) OVER (
        PARTITION BY to_char(order_date, 'yyyy-MM')
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ), 2) AS cumulative_daily_revenue
FROM daily_revenue t
ORDER BY to_char(order_date, 'yyyy-MM'), order_date
LIMIT 10
  • Lastly, examples for 3-day moving sum as well as average using daily_revenue in the retail database.
SELECT t.*,
    round(sum(t.revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20
SELECT t.*,
    round(sum(t.revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
    ), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20
SELECT t.*,
    round(avg(t.revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20

I hope this tutorial on Cumulative or Moving Aggregations using Analytic Functions was helpful and insightful. Remember to practice the concepts mentioned above and engage with the community for any further clarifications!

Watch the video tutorial here