Window functions let you compute a value for each row while still seeing other rows in the dataset. ROW_NUMBER, running totals, period-over-period deltas, top-N per group: all of these collapse into a single SELECT with OVER(). Without window functions you reach for self-joins or Python. With them, the same logic fits on two lines. Running totals, rank within group, period-over-period deltas, top-N per category — these all collapse into a single SELECT with OVER(...). Without window functions you reach for self-joins, correlated subqueries, or a Python loop. With them, the same logic fits on two lines and the planner can use an index.
In This Article
This is the reference we’ve sent to every analyst we’ve onboarded since 2020, expanded for 2026.
Anatomy of OVER(...)
Every window function follows the same shape:
function_name(args) OVER (
PARTITION BY <columns> -- optional: split rows into independent windows
ORDER BY <columns> -- optional: sequence rows inside each window
<frame_clause> -- optional: which rows count as "the window"
)
The three optional pieces interact:
PARTITION BYsplits the dataset into independent groups. The function reruns inside each group. Without it, the whole table is one group.ORDER BYsorts rows inside each partition. Without it, the order is engine-dependent and any function that cares about order (ROW_NUMBER,LAG, running totals) gives nondeterministic results.- The frame clause picks which rows in the partition contribute to the current row’s value. The default depends on whether you wrote
ORDER BY— and the default is rarely what you want for running calculations. We’ll come back to this.
The three families
There are dozens of window functions but only three categories matter in practice.
1. Ranking functions
Assign each row a position within its partition.
| Function | What it returns |
|---|---|
ROW_NUMBER() | 1, 2, 3 — strictly increasing, no ties |
RANK() | 1, 2, 2, 4 — ties get the same rank, next rank skips |
DENSE_RANK() | 1, 2, 2, 3 — ties get the same rank, next rank does NOT skip |
NTILE(n) | Splits the partition into n equal-sized buckets, returns 1..n |
PERCENT_RANK() | Relative rank in [0, 1] |
Most common use: pick top-N per group.
SELECT *
FROM (
SELECT
customer_id,
order_date,
order_value,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_value DESC) AS rn
FROM orders
) ranked
WHERE rn <= 3;
This returns the three largest orders per customer. Without window functions you’d be writing a correlated subquery and watching the query planner cry.
RANK vs DENSE_RANK vs ROW_NUMBER: use ROW_NUMBER when you need exactly N results (ties are broken arbitrarily but the count is predictable). Use RANK when ties should share a rank and the next rank should skip (1, 2, 2, 4). Use DENSE_RANK when ties share a rank and the next rank should not skip (1, 2, 2, 3). In practice: top-N filtering uses ROW_NUMBER, leaderboards use DENSE_RANK.
2. Aggregate functions over a window
Every familiar aggregate — SUM, AVG, COUNT, MIN, MAX — can also be a window function. Instead of collapsing rows to one, it returns the aggregate for each row’s window.
The cleanest use is the running total:
SELECT
order_date,
revenue,
SUM(revenue) OVER (ORDER BY order_date) AS running_revenue
FROM daily_revenue;
This returns a running cumulative sum, one row per date. Same shape works for averages, counts, anything aggregate.
Two ways this goes wrong:
- Forgetting
ORDER BY.SUM(revenue) OVER ()returns the grand total on every row, not a running total. Useful for “what fraction of revenue does this row represent” but not for time-series accumulation. - Forgetting that the default frame includes all earlier rows when
ORDER BYis present. Specifically, withORDER BYthe default isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW— i.e. running total. WithoutORDER BYit’sROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING— i.e. the entire partition. These give wildly different answers and people regularly mix them up.
3. Offset functions
Reach into the row before or after the current one inside the same partition.
| Function | What it returns |
|---|---|
LAG(col, n) | Value of col from the row n positions earlier (default n=1) |
LEAD(col, n) | Value of col from the row n positions later |
FIRST_VALUE(col) | Value of col from the first row in the window |
LAST_VALUE(col) | Value of col from the last row in the window (gotcha: frame-dependent) |
NTH_VALUE(col, n) | Value from the n-th row |
The bread-and-butter use is period-over-period delta:
SELECT
order_date,
revenue,
revenue - LAG(revenue) OVER (ORDER BY order_date) AS day_over_day_delta
FROM daily_revenue;
Or month-over-month growth, with a PARTITION BY year:
SELECT
year,
month,
revenue,
revenue / NULLIF(LAG(revenue) OVER (PARTITION BY year ORDER BY month), 0) - 1 AS mom_growth
FROM monthly_revenue;
LAST_VALUE is the trap. By default, the frame ends at the current row, so LAST_VALUE(col) OVER (ORDER BY x) returns the current row’s value, not the last row of the partition. To actually get the last row, you have to extend the frame:
LAST_VALUE(col) OVER (
ORDER BY x
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
In practice: use FIRST_VALUE for the first row, MAX(col) OVER (PARTITION BY ...) for the last. Both avoid the LAST_VALUE frame trap and are readable by anyone who hasn’t memorized the frame-clause default.
A worked example: customer cohort revenue
Suppose you have an orders table with customer_id, signup_date, order_date, order_value. You want:
- Each customer’s first order
- Each customer’s running total spend
- The customer’s rank within their signup cohort by lifetime spend
One query:
SELECT
customer_id,
signup_date,
order_date,
order_value,
FIRST_VALUE(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
) AS first_order_date,
SUM(order_value) OVER (
PARTITION BY customer_id ORDER BY order_date
) AS running_spend,
DENSE_RANK() OVER (
PARTITION BY DATE_TRUNC('month', signup_date)
ORDER BY SUM(order_value) OVER (PARTITION BY customer_id) DESC
) AS cohort_spend_rank
FROM orders;
Three window functions, three different partitions, one pass through the data. The same logic done with joins is 40 lines and 10 times slower.
Common production gotchas
A few we’ve hit in client engagements:
- High-cardinality
PARTITION BY. Partitioning byuser_idon a 500M-row events table can blow up memory if the planner doesn’t choose a hash partitioning strategy. Test on a sample before assuming it’ll fit. - Window functions can’t appear in a
WHEREclause. You need a subquery or CTE to filter on the result (see the top-N pattern above). They also can’t appear inGROUP BYdirectly — they execute afterGROUP BY. ORDER BYties. If your partition has 2,000 rows per date and youORDER BY date, the order of rows within a date is engine-dependent.ROW_NUMBERwill assign different numbers on different runs. Always add a tiebreaker:ORDER BY date, id.- Frame clauses default differently per engine. PostgreSQL, ClickHouse, BigQuery, and Snowflake all behave the same on the standard, but legacy MS SQL Server and older MySQL have quirks. When porting queries between engines, spot-check.
- Window functions on ClickHouse have engine-specific gotchas. The
groupArrayMovingSum+ARRAY JOINpattern is sometimes faster thanSUM() OVER ORDER BYand sometimes slower; benchmark on your data. See Cumulative Sum in SQL for the comparison.
Takeaway
Three things to remember when you reach for window functions:
- The three families are ranking, aggregate-over-window, and offset. Once you can name which family fits the question, the syntax is mechanical.
ORDER BYinside theOVERclause is almost always required and almost always needs an explicit tiebreaker. Without it, results are nondeterministic.- Frame clauses default to “everything up to current row” with
ORDER BYand “the entire partition” without. Knowing which one you’re getting is the difference between a running total and a grand total.
If you find yourself writing a self-join or correlated subquery in 2026, take 5 minutes and try expressing it as a window function first. About 4 times out of 5, it’s cleaner and faster.