Data Engineering

Funnel Reports in SQL: From Naive to Production

· 26 min read

A funnel report is the bluntest answer to “where is the drop-off”. You pick three to seven events that a user is supposed to do in order, you count how many users do each step, you draw the bars. Every BI tool has a funnel widget. Every analyst writes the underlying SQL at least once a year, and the first version is almost always wrong.

In This Article

  1. The original Redash setup, ported
  2. Take 1: the naive funnel
  3. Take 2: the attributed funnel
  4. Take 3: the windowed funnel (the production one)
  5. What the funnel widget actually wants
  6. When this still bites
  7. The two-line takeaway

The wrongness is not about syntax. It is about what the query is actually counting. A funnel written from a goal-completions table looks right, ships to a dashboard, and quietly drifts from reality because users come back, re-enter the funnel, skip steps, and arrive from different sources. By the time someone notices, the dashboard has been “the number” for three months.

This post walks the same funnel three ways: the naive version every analyst writes first, the attributed version that fixes the most common bug, and the windowed version a production data team would actually run. All three are in DuckDB SQL that you can paste straight into your own warehouse with no setup. Postgres and ClickHouse notes follow each block.

Five-step Trader Joe checkout funnel
A five-step Trader Joe’s checkout funnel rendered in v2 dark.

The original Redash setup, ported

The shape of the source article is simple. You hit Google Analytics from a Redash query that returns user counts and three goal completions, then a second SQL query unions those goal numbers into rows that Redash’s funnel widget knows how to draw. Re-rendered in 2026, the equivalent stack is GA4 (or any product analytics tool with an export to your warehouse) feeding a flat events table.

For this post the warehouse is DuckDB. Make a synthetic events table once and the rest of the queries run unchanged.

-- One-time setup. Generates 30 days of synthetic checkout events.
-- Hash-based "random" so the data is the same on every run.
CREATE OR REPLACE TABLE events AS
WITH RECURSIVE u(uid) AS (
  SELECT 1 UNION ALL SELECT uid + 1 FROM u WHERE uid < 4000
),
steps(step_no, step_name, keep_rate) AS (
  VALUES
    (1, 'signup',   1.00),
    (2, 'browse',   0.72),
    (3, 'add_cart', 0.45),
    (4, 'checkout', 0.28),
    (5, 'purchase', 0.19)
),
joined AS (
  SELECT u.uid, s.step_no, s.step_name, s.keep_rate,
         (HASH(u.uid * 1000 + s.step_no) % 10000) / 10000.0  AS r,
         (HASH(u.uid + 11) % 30)                              AS signup_day,
         (HASH(u.uid * 1000 + s.step_no + 99) % 2995 + 5)     AS step_gap_min
  FROM u CROSS JOIN steps s
),
with_offsets AS (
  SELECT *,
         SUM(step_gap_min) OVER (PARTITION BY uid ORDER BY step_no
                                 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_min
  FROM joined
)
SELECT uid AS user_id,
       step_name,
       step_no,
       TIMESTAMP '2026-04-25 00:00:00'
         + INTERVAL (signup_day)     DAY
         + INTERVAL (cumulative_min) MINUTE                                          AS event_ts
FROM with_offsets
WHERE r < keep_rate;

That produces 10,548 rows for 4,000 users across five steps, with the standard “fewer users per step” shape of an e-commerce checkout. Counts on this seed: 4,000 signups, 2,884 browses, 1,803 add-carts, 1,081 checkouts, 780 purchases. Rates were picked to echo the source article’s drop-off pattern. The HASH-based filter is deterministic, so two runs of the same setup produce the same numbers, which matters when the next person to open the notebook expects the dashboard to match.

Take 1: the naive funnel

The naive version is also the version every analyst writes first. Count users at each step. Union the counts. Hand the result to the funnel widget.

-- Naive funnel. One row per step. Almost certainly wrong.
SELECT step_name,
       COUNT(DISTINCT user_id) AS users
FROM events
WHERE step_name IN ('signup', 'browse', 'add_cart', 'checkout', 'purchase')
GROUP BY step_name
ORDER BY CASE step_name
  WHEN 'signup'   THEN 1
  WHEN 'browse'   THEN 2
  WHEN 'add_cart' THEN 3
  WHEN 'checkout' THEN 4
  WHEN 'purchase' THEN 5
END;

Why this is wrong: a user who lands on the purchase event without ever doing signup still counts in the purchase bucket. A user who restarts the funnel three weeks later counts twice in every step they revisit. A user who jumps from browse straight to purchase (bookmarked product page, returning customer) is counted in step 5 but missing from steps 3 and 4, which is structurally impossible in a funnel.

The naive version is fine for a quick sanity check on a single product. It is not fine for a stakeholder dashboard. The number it produces is the most generous possible reading of every step, and stakeholders treat it as ground truth.

Naive vs attributed vs windowed funnel approaches
Naive vs attributed vs windowed funnel approaches compared side by side.

Take 2: the attributed funnel

The fix is to require ordering. A user only counts in step N if they have done step N minus 1 before it. The classic SQL pattern is self-joins, one per step.

-- Attributed funnel. Each step requires the previous step to have happened earlier.
WITH s1 AS (SELECT user_id, MIN(event_ts) AS t FROM events WHERE step_name = 'signup'   GROUP BY user_id),
     s2 AS (SELECT user_id, MIN(event_ts) AS t FROM events WHERE step_name = 'browse'   GROUP BY user_id),
     s3 AS (SELECT user_id, MIN(event_ts) AS t FROM events WHERE step_name = 'add_cart' GROUP BY user_id),
     s4 AS (SELECT user_id, MIN(event_ts) AS t FROM events WHERE step_name = 'checkout' GROUP BY user_id),
     s5 AS (SELECT user_id, MIN(event_ts) AS t FROM events WHERE step_name = 'purchase' GROUP BY user_id)
SELECT 'signup'   AS step_name, COUNT(*) AS users FROM s1
UNION ALL
SELECT 'browse',                COUNT(*)          FROM s1 JOIN s2 USING (user_id) WHERE s2.t > s1.t
UNION ALL
SELECT 'add_cart',              COUNT(*)          FROM s1 JOIN s2 USING (user_id) JOIN s3 USING (user_id) WHERE s2.t > s1.t AND s3.t > s2.t
UNION ALL
SELECT 'checkout',              COUNT(*)          FROM s1 JOIN s2 USING (user_id) JOIN s3 USING (user_id) JOIN s4 USING (user_id) WHERE s2.t > s1.t AND s3.t > s2.t AND s4.t > s3.t
UNION ALL
SELECT 'purchase',              COUNT(*)          FROM s1 JOIN s2 USING (user_id) JOIN s3 USING (user_id) JOIN s4 USING (user_id) JOIN s5 USING (user_id) WHERE s2.t > s1.t AND s3.t > s2.t AND s4.t > s3.t AND s5.t > s4.t;

This is correct in the strict sense and you will see it in many dashboards. Two problems show up at scale. The number of joins grows linearly with the number of steps; a ten-step funnel reads like a tax form. And the time window is implicit; a user who signed up in January and purchased in November still counts in step 5, which is almost never what the stakeholder meant when they asked for “the funnel”.

The fix for the second problem is a session window in the WHERE clause: AND s5.t - s1.t < INTERVAL 30 DAY. The fix for the first problem is take 3.

Take 3: the windowed funnel (the production one)

Window functions collapse the whole pattern into one pass over the events table. The trick is to walk the events in chronological order per user, keep only the events in the expected sequence, and bucket the user by how far they made it.

-- Windowed funnel. One pass over events. Time-bounded. Production-ready.
WITH ordered AS (
  SELECT user_id,
         step_name,
         step_no,
         event_ts,
         LAG(step_no)  OVER (PARTITION BY user_id ORDER BY event_ts) AS prev_step,
         FIRST_VALUE(event_ts) OVER (PARTITION BY user_id ORDER BY event_ts
                                     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS first_ts
  FROM events
  WHERE step_name IN ('signup', 'browse', 'add_cart', 'checkout', 'purchase')
),
in_sequence AS (
  SELECT user_id,
         step_no,
         step_name,
         event_ts
  FROM ordered
  WHERE (prev_step IS NULL AND step_no = 1)
     OR (step_no = prev_step + 1)
),
windowed AS (
  SELECT user_id, MAX(step_no) AS max_step
  FROM in_sequence
  WHERE event_ts <= (SELECT MIN(event_ts) FROM in_sequence i2
                     WHERE i2.user_id = in_sequence.user_id) + INTERVAL 30 DAY
  GROUP BY user_id
)
SELECT s.step_no,
       s.step_name,
       COUNT(w.user_id) FILTER (WHERE w.max_step >= s.step_no) AS users
FROM (VALUES
        (1, 'signup'),
        (2, 'browse'),
        (3, 'add_cart'),
        (4, 'checkout'),
        (5, 'purchase')
     ) AS s(step_no, step_name)
LEFT JOIN windowed w ON TRUE
GROUP BY s.step_no, s.step_name
ORDER BY s.step_no;

Three things are doing the work. LAG(step_no) lets every row see what its user did right before it; the in_sequence CTE keeps only events that continue the chain, so a user who jumps straight to purchase never gets the purchase row. The 30-day window clause makes the answer match what the stakeholder asked for. And the final cross join against the step list guarantees the result has one row per step even when zero users reached it.

Event sequence timeline showing windowing attribution
Event timeline showing how the windowing SQL classifies raw events into funnel steps.
This pattern ports cleanly:

  • Postgres: works as-is from Postgres 12 onward. The INTERVAL 30 DAY syntax needs to be INTERVAL '30 days'.
  • ClickHouse: replace the windowing block with the native windowFunnel(window)(timestamp, conditions...) function. It does the same thing in one function call and is built for this exact pattern.
  • BigQuery: same SQL, but use TIMESTAMP_ADD(first_ts, INTERVAL 30 DAY) instead of the + INTERVAL shorthand.

What the funnel widget actually wants

Most funnel widgets, including the one in the original Redash article, expect a tall result: one row per step, columns step_name and users. Both the attributed and windowed queries already produce that shape. If your BI tool wants step_name, users, and conversion_rate instead, wrap the final select:

SELECT step_name,
       users,
       ROUND(users * 100.0 / FIRST_VALUE(users) OVER (ORDER BY step_no), 1) AS conversion_pct
FROM (
  -- the windowed query from above
) f
ORDER BY step_no;

The conversion percentage is always relative to step 1. If the stakeholder wants step-to-step conversion ("of users who reached browse, what share made it to add_cart"), swap FIRST_VALUE for LAG(users) OVER (ORDER BY step_no).

When this still bites

Three failure modes survive even the windowed query.

  • Re-entries. A user who completes the funnel in March and starts again in April will count once in the windowed version (good) but their April activity is invisible. If the funnel is a recurring action like a subscription renewal, split the events into sessions first and count sessions, not users. The session split is the same LAG() pattern with a gap threshold, and the cohort analysis playbook walks through it for retention.
  • Step skips. The windowed query requires step_no = prev_step + 1. A user who legitimately skips a step (the "remove from cart" path on a returning customer with saved payment) gets dropped from the funnel. If the product allows step skips, either change the chain rule to step_no > prev_step or model the funnel as a graph and use a path query, not a funnel widget.
  • Late-binding identity. Users browse anonymously and identify after signup. Every event before the identify call belongs to a different user_id than every event after. The fix is upstream: an identity-resolution model that stitches anonymous and identified user IDs before this query runs. Most product analytics tools do this for you; if you write your own, LTV modeling shows the same identity-stitch pattern because it has the same root problem.

The two-line takeaway

The naive funnel ships in five minutes and quietly lies. The attributed funnel is correct but doesn't scale past five steps. The windowed funnel is the one to ship to a stakeholder dashboard, and the version above runs on DuckDB, Postgres, and BigQuery with two-line changes. ClickHouse users should reach for windowFunnel and skip the manual windowing.

The biggest lift here is not the SQL. It is being honest with the stakeholder about what the funnel is actually counting: distinct users, in order, within a window, with no step skips. State those four constraints out loud and the dashboard becomes a tool instead of a story.

Keep reading

Enjoyed this article?

Get weekly data strategy insights delivered to your inbox.

Get in Touch

Let's Discuss Your Project

Book a 30-minute discovery call. We'll assess your data maturity and recommend the right approach — no strings attached.

Book a Discovery Call →
Need help with your data strategy? Book a Discovery Call →