Database Essentials - Writing Advanced SQL Queries - Overview of Analytic Functions

Let us get an overview of Analytics or Windowing Functions as part of SQL.

  • Aggregate Functions (sum, min, max, avg)

  • Window Functions (lead, lag, first_value, last_value)

  • Rank Functions (rank, dense_rank, row_number, etc)

  • For all the functions when used as part of Analytic or Windowing functions we use OVER clause.

  • For aggregate functions we typically use PARTITION BY

  • For global ranking and windowing functions we can use ORDER BY sort_column and for ranking and windowing within a partition or group we can use PARTITION BY partition_column ORDER BY sort_column.

  • Here is how the syntax will look like.

    • Aggregate - func() OVER (PARTITION BY partition_column)
    • Global Rank - func() OVER (ORDER BY sort_column DESC)
    • Rank in a partition - func() OVER (PARTITION BY partition_column ORDER BY sort_column DESC)
  • We can also get cumulative or moving metrics by adding ROWS BETWEEN clause. We will see details later.

Prepare Tables

Let us create a couple of tables which will be used for the demonstrations of Windowing and Ranking functions.

  • We have ORDERS and ORDER_ITEMS tables in our retail database.

  • Let us take care of computing daily revenue as well as daily product revenue.

  • As we will be using the same data several times, let us create the tables to pre-compute the data.

  • daily_revenue will have the order_date and revenue, where data is aggregated using order_date as the partition key.

  • daily_product_revenue will have order_date, order_item_product_id, and revenue. In this case, data is aggregated using order_date and order_item_product_id as partition keys.

Let us create a table using CTAS to save daily revenue.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@pg.itversity.com:5432/itversity_retail_db
%%sql

DROP TABLE IF EXISTS daily_revenue
%%sql

CREATE TABLE daily_revenue

AS

SELECT o.order_date,

round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue

FROM orders o JOIN order_items oi

ON o.order_id = oi.order_item_order_id

WHERE o.order_status IN ('COMPLETE', 'CLOSED')

GROUP BY o.order_date
%%sql

SELECT * FROM daily_revenue

ORDER BY order_date

LIMIT 10
Let us create a table using CTAS to save daily product revenue.
%%sql

DROP TABLE IF EXISTS daily_product_revenue
%%sql

CREATE TABLE daily_product_revenue

AS

SELECT o.order_date,

oi.order_item_product_id,

round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue

FROM orders o JOIN order_items oi

ON o.order_id = oi.order_item_order_id

WHERE o.order_status IN ('COMPLETE', 'CLOSED')

GROUP BY o.order_date, oi.order_item_product_id
%%sql

SELECT * FROM daily_product_revenue

ORDER BY order_date, revenue DESC

LIMIT 10

Watch the video tutorial here