Let us get an overview of Analytics or Windowing Functions as part of SQL.
-
Aggregate Functions (sum, min, max, avg)
-
Window Functions (lead, lag, first_value, last_value)
-
Rank Functions (rank, dense_rank, row_number, etc)
-
For all the functions when used as part of Analytic or Windowing functions we use OVER clause.
-
For aggregate functions we typically use PARTITION BY
-
For global ranking and windowing functions we can use ORDER BY sort_column and for ranking and windowing within a partition or group we can use PARTITION BY partition_column ORDER BY sort_column.
-
Here is how the syntax will look like.
- Aggregate - func() OVER (PARTITION BY partition_column)
- Global Rank - func() OVER (ORDER BY sort_column DESC)
- Rank in a partition - func() OVER (PARTITION BY partition_column ORDER BY sort_column DESC)
-
We can also get cumulative or moving metrics by adding ROWS BETWEEN clause. We will see details later.
Prepare Tables
Let us create a couple of tables which will be used for the demonstrations of Windowing and Ranking functions.
-
We have ORDERS and ORDER_ITEMS tables in our retail database.
-
Let us take care of computing daily revenue as well as daily product revenue.
-
As we will be using the same data several times, let us create the tables to pre-compute the data.
-
daily_revenue will have the order_date and revenue, where data is aggregated using order_date as the partition key.
-
daily_product_revenue will have order_date, order_item_product_id, and revenue. In this case, data is aggregated using order_date and order_item_product_id as partition keys.
Let us create a table using CTAS to save daily revenue.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@pg.itversity.com:5432/itversity_retail_db
%%sql
DROP TABLE IF EXISTS daily_revenue
%%sql
CREATE TABLE daily_revenue
AS
SELECT o.order_date,
round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date
%%sql
SELECT * FROM daily_revenue
ORDER BY order_date
LIMIT 10
Let us create a table using CTAS to save daily product revenue.
%%sql
DROP TABLE IF EXISTS daily_product_revenue
%%sql
CREATE TABLE daily_product_revenue
AS
SELECT o.order_date,
oi.order_item_product_id,
round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date, oi.order_item_product_id
%%sql
SELECT * FROM daily_product_revenue
ORDER BY order_date, revenue DESC
LIMIT 10