SQL Mastery for Interviews
Window Functions Deep Dive
4 min read
Window functions are the most tested advanced SQL concept in data science interviews. They let you calculate across rows while keeping individual row detail - something regular GROUP BY cannot do.
The Window Function Syntax
function_name(column) OVER (
PARTITION BY partition_column -- Optional: group rows
ORDER BY order_column -- Optional: order within partition
ROWS/RANGE frame_specification -- Optional: define window frame
)
Ranking Functions
Three ranking functions serve different purposes:
| Function | Ties | Gaps |
|---|---|---|
| ROW_NUMBER() | Arbitrary order | No gaps |
| RANK() | Same rank | Gaps after ties |
| DENSE_RANK() | Same rank | No gaps |
Example: Ranking salespeople by revenue
SELECT
name,
revenue,
ROW_NUMBER() OVER (ORDER BY revenue DESC) as row_num,
RANK() OVER (ORDER BY revenue DESC) as rank,
DENSE_RANK() OVER (ORDER BY revenue DESC) as dense_rank
FROM sales;
-- Results:
-- name | revenue | row_num | rank | dense_rank
-- Alice | 100 | 1 | 1 | 1
-- Bob | 100 | 2 | 1 | 1 (tie with Alice)
-- Charlie | 90 | 3 | 3 | 2 (rank skips 2, dense_rank doesn't)
Classic interview question: "Find the top 3 products by sales in each category"
WITH ranked AS (
SELECT
category,
product_name,
sales,
DENSE_RANK() OVER (
PARTITION BY category
ORDER BY sales DESC
) as rank
FROM products
)
SELECT category, product_name, sales
FROM ranked
WHERE rank <= 3;
LAG and LEAD
Access previous or next row values - essential for time-series analysis.
-- Calculate day-over-day change in revenue
SELECT
date,
revenue,
LAG(revenue, 1) OVER (ORDER BY date) as prev_day_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY date) as daily_change,
ROUND(
100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY date))
/ LAG(revenue, 1) OVER (ORDER BY date),
2
) as pct_change
FROM daily_sales;
Interview pattern: "Identify users whose spending decreased for 3 consecutive months"
WITH monthly_spending AS (
SELECT
user_id,
DATE_TRUNC('month', order_date) as month,
SUM(amount) as spending,
LAG(SUM(amount), 1) OVER (PARTITION BY user_id ORDER BY DATE_TRUNC('month', order_date)) as prev_month,
LAG(SUM(amount), 2) OVER (PARTITION BY user_id ORDER BY DATE_TRUNC('month', order_date)) as prev_2_month
FROM orders
GROUP BY user_id, DATE_TRUNC('month', order_date)
)
SELECT DISTINCT user_id
FROM monthly_spending
WHERE spending < prev_month
AND prev_month < prev_2_month;
Running Totals and Moving Averages
Window frames let you calculate cumulative or moving calculations:
-- Running total of revenue
SELECT
date,
revenue,
SUM(revenue) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total
FROM daily_sales;
-- 7-day moving average
SELECT
date,
revenue,
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7d
FROM daily_sales;
PARTITION BY for Segmented Analysis
PARTITION BY restarts calculations for each group:
-- Running total PER customer
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) as customer_running_total
FROM orders;
Interview Pro Tips
- PARTITION BY vs GROUP BY: PARTITION BY keeps all rows, GROUP BY collapses them
- ORDER BY in OVER: Required for ranking and LAG/LEAD, optional for simple aggregates
- Default frame: If ORDER BY is specified, default is
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW - Multiple windows: You can use different OVER() clauses in the same query
Window functions separate intermediate from senior candidates. Master these patterns and you'll stand out. :::