Setting up data infrastructure

Building a Data-Driven Culture at a $25M ARR B2B SaaS

A $25M ARR B2B SaaS with 200 employees suffered from data silos, no single source of truth, and rising churn. We implemented a modern data stack, self-serve analytics, and a churn prediction model — improving net revenue retention from 95% to 108% in one quarter

Impact
95% → 108%
Net Revenue Retention
−60%
Fewer Ad-Hoc Requests
100%
Dashboard Adoption

Net revenue retention improved from 95% to 108%. Ad-hoc data requests down 60%. Executive dashboard adoption reached 100%.

A $25M ARR B2B SaaS company was drowning in data but starving for insights. We built the infrastructure, the models, and the culture to turn them into a genuinely data-driven organization.

ClientB2B SaaS (project management / collaboration vertical)
Revenue$25M ARR, Series C
Team Size~200 employees across 4 offices
Engagement12-week Data Strategy & Infrastructure Build

The Challenge: Data Everywhere, Insights Nowhere

On the surface, this company looked data-mature. They had a data team of four analysts, multiple dashboards in Tableau, a Snowflake data warehouse, and executives who talked about being "data-driven" in every all-hands meeting. But beneath the surface, the reality was different:

  • Data silos across every department. Sales had their numbers in Salesforce. Product had theirs in Amplitude. Finance used spreadsheets. Customer Success relied on Gainsight. Each team had their own definition of "customer," "revenue," and "churn." When the CEO asked a simple question — "What's our churn rate?" — she got four different answers.
  • No single source of truth. The Snowflake warehouse existed but was poorly maintained. Only one analyst knew how the ELT pipelines worked. When he went on vacation, three dashboards broke and nobody could fix them.
  • Ad-hoc requests consuming the data team. The four analysts spent 80% of their time answering one-off questions from stakeholders ("Can you pull me the list of accounts that..." / "What was our conversion rate for Q2?"). Strategic analysis — the kind that actually moves the business — got pushed to evenings and weekends.
  • Churn analysis was impossible. The company knew they had a churn problem (net revenue retention was 95%, below the 110%+ benchmark for their stage), but they couldn't diagnose it. Was it product? Support? Pricing? Specific segments? Nobody could tell because the data wasn't connected.
  • Executive dashboards nobody trusted. The leadership team had access to Tableau dashboards, but most executives maintained their own spreadsheets "just to double-check the numbers." Dashboard adoption was under 30%.

The new VP of Data — hired three months prior — recognized that the problem wasn't tools or talent. It was architecture and process. She brought us in to rebuild the data foundation and establish the practices that would turn "we want to be data-driven" into reality.

Our Approach: Stack, Self-Serve, and Signal

We designed a 12-week engagement with three workstreams running in parallel: modernize the data stack, enable self-serve analytics, and build a churn prediction model as the flagship use case.

Workstream 1: Modern Data Stack (Weeks 1–6)

The existing Snowflake warehouse had accumulated 18 months of technical debt. Raw data was loaded but rarely transformed. Business logic lived in Tableau calculated fields — different for every dashboard, often contradictory.

We rebuilt the transformation layer using dbt:

  • Staging models — clean, deduplicated, timezone-normalized source data from Salesforce, Stripe, Amplitude, Gainsight, Intercom, and the application database
  • Intermediate models — business entity resolution (matching "accounts" across systems), SCD Type 2 history tracking for key dimensions, and calculated fields using standardized business logic
  • Mart models — department-specific data marts (Sales, Product, CS, Finance) with pre-aggregated metrics and dimensional models optimized for BI tool performance
  • Metric definitions in code — every business metric (ARR, NRR, churn rate, LTV, CAC) defined once in dbt, versioned in Git, and used consistently across all dashboards

We also implemented dbt tests and documentation — 340 data quality tests running on every pipeline execution, and a searchable data catalog so any team member could understand what data was available and how it was defined.

Workstream 2: Self-Serve Analytics (Weeks 4–10)

The goal was to eliminate 80% of ad-hoc data requests by giving business users the tools and skills to answer their own questions.

We replaced Tableau with Metabase as the primary BI platform. The decision was deliberate: Metabase's question builder allows non-technical users to explore data without writing SQL, while still supporting SQL for power users. Tableau is powerful, but in this organization it had become a bottleneck — only the data team knew how to use it.

We built a curated set of 24 self-serve dashboards organized by department:

  • Executive Suite — ARR waterfall, NRR trend, pipeline health, burn rate (4 dashboards)
  • Sales — pipeline velocity, conversion rates by segment, rep performance, forecast vs. actual (6 dashboards)
  • Product — feature adoption, user engagement cohorts, time-to-value, support ticket correlation (5 dashboards)
  • Customer Success — health scores, expansion pipeline, at-risk accounts, NPS trends (5 dashboards)
  • Finance — unit economics, cohort LTV, gross margin by segment, cash forecast (4 dashboards)

Critically, we also ran 3 training workshops — teaching team leads in each department to build their own questions in Metabase, understand the data model, and interpret metrics correctly. The training included hands-on exercises using real company data, not hypothetical examples.

Workstream 3: Churn Prediction Model (Weeks 6–12)

With the data foundation in place, we built a predictive churn model as the flagship analytics use case — solving the most pressing business problem while demonstrating the value of the new data stack.

The model combined signals from four domains:

  • Product usage — login frequency, feature adoption depth, usage trend direction
  • Support interactions — ticket volume, sentiment, resolution time, escalation history
  • Commercial signals — contract renewal proximity, pricing tier, expansion/contraction history
  • Engagement health — stakeholder contact frequency, executive sponsor changes, QBR attendance

We trained a gradient boosting model on 24 months of historical data, achieving 82% precision at 30-day pre-churn detection. The model scored every account nightly and surfaced the top 20 at-risk accounts to the CS team each morning via a Slack integration.

But the model's biggest impact wasn't prediction — it was explanation. For each at-risk account, the system provided the top 3 contributing factors (e.g., "Login frequency dropped 60% in the last 14 days" or "3 open support tickets with negative sentiment"). This gave CSMs specific, actionable context for their outreach — not just "this account might churn" but "here's why, and here's what to do about it."

Key Deliverables

  • Modern Data Stack — Snowflake + dbt transformation layer with 120+ models, 340 data quality tests, and automated documentation
  • Unified Metric Definitions — 45 business metrics defined in code, versioned, and used consistently across all reporting
  • Self-Serve Analytics Platform — 24 Metabase dashboards across 5 departments, with role-based access control
  • Team Training Program — 3 department-specific workshops on self-serve analytics and data literacy
  • Churn Prediction Model — ML model with 82% precision, daily scoring, Slack integration, and explainability layer
  • Data Catalog — searchable documentation of all data sources, transformations, and metric definitions
  • Data Team Operating Model — new team structure, SLA framework for ad-hoc requests, and quarterly OKR process

Results

  • Net revenue retention: 95% → 108% — the churn prediction model helped the CS team save $1.2M in ARR at risk during the first quarter of operation
  • 60% reduction in ad-hoc data requests — self-serve dashboards eliminated the majority of one-off analyst pulls, freeing the data team for strategic work
  • 100% executive dashboard adoption — within 6 weeks of launch, every member of the leadership team was using the executive dashboards as their primary data source (verified by Metabase usage logs). Spreadsheet "shadow reporting" dropped to near zero.
  • 4 → 16 hours/week of strategic analysis — with ad-hoc requests reduced, the data team quadrupled their time spent on proactive analysis and model development
  • Pipeline accuracy improved 28% — sales forecasting became significantly more reliable when pipeline dashboards used consistent, trusted data
  • First ML model in production — the churn prediction model became the company's first production ML system, establishing the pattern for future models (expansion prediction, lead scoring)

"We've said 'we're a data-driven company' for three years, but this is the first time it's actually true. The difference isn't the tools — it's that everyone now trusts the same numbers, can access them without filing a ticket, and makes decisions based on evidence instead of intuition. The churn model alone paid for the entire engagement in the first month."

— VP of Data, B2B SaaS Platform

Why This Approach Works

  • Metric definitions in code, not in people's heads. When "churn rate" is defined once in dbt and used everywhere, the "my numbers don't match your numbers" problem disappears permanently.
  • Self-serve reduces demand, not quality. The data team isn't a help desk. Self-serve analytics eliminates low-value requests so analysts can focus on high-value strategic work.
  • Prediction plus explanation. A churn score without context is useless to a CSM. The explainability layer — showing why each account is at risk — is what makes the model actionable.
  • Culture change requires a flagship win. The churn model wasn't just technically sound — it was visible, impactful, and directly tied to revenue. That single win convinced the entire organization to trust the new data infrastructure.

Facing similar data challenges?

Book a Discovery Call →
data-strategy self-serve-analytics metabase snowflake dbt B2B SaaS churn prediction

Have a similar challenge?
Let's talk about your data

A 30-minute conversation about your data stack, pain points, and opportunities.

30-min video call No commitment Actionable next steps

Explore related projects

View All Case Studies →
Need help with your data strategy? Book a Discovery Call →