A practical walkthrough from real client work, not a theoretical primer.
In This Article
- Imagine this
- What MCP is, in one paragraph
- The 500+ questions problem
- What the setup actually looks like
- SQL from plain English (and the card actually gets made)
- Weekly KPI monitoring (the one I keep running)
- The strategic question type
- What MCP can’t do (yet)
- Debugging when things go wrong
- The bigger picture
- What I’d actually recommend
- About the author
Imagine this
You open a terminal. You type one sentence:
Show me which user cohorts from Q3 had the best 90-day retention. Create a Metabase card in the Retention Analysis collection.
Eight seconds later, the card exists. Top cohort identified. SQL written. Card saved. No dashboard switching, no copy-paste, no Jira ticket to your data team.
That’s not a hypothetical. It’s a real workflow I’ve been running on client projects for the past few weeks since Metabase released their official MCP server. The setup takes about 30 minutes. The payoff lasts as long as your data does.
This article walks through what I actually did, what worked, what didn’t, and the full setup if you want to try it yourself.
If you’d rather skip the context and go straight to the step-by-step setup, the free guide is here: valiotti.com/metabase-mcp-guide.
What MCP is, in one paragraph
MCP stands for Model Context Protocol. It’s an open standard from Anthropic that lets Claude connect to external tools and read or write data through them. Think of it like a plugin system for AI agents. The model doesn’t simulate using a tool. It actually uses it. You ask Claude to query your Metabase, and Claude makes a real API call, gets real data back, and answers based on what it found.
Metabase shipped an official MCP server a few weeks ago. That server exposes 24 tools to any MCP-compatible client. The most useful ones: list dashboards, get card definitions, run queries, create cards, read activity logs, inspect schemas. Once Claude Code has access to those, it can do the work that would normally require switching between three tabs and a SQL editor.
If you’ve used Claude Code before, this is the same agent you already know. The difference is what it can reach. Without MCP, Claude can write code and edit files. With the Metabase MCP, it can also read and write to your BI tool directly.
The 500+ questions problem
Here’s the situation that turned me into a believer.
A client I work with has been on Metabase for about three years. The instance has 500+ questions, around 60 dashboards, and a few dozen collections. Three different teams created content over the years. People left. Naming got inconsistent. The original goal of every dashboard is now a mystery to whoever inherited it.
The task I was given: figure out which questions and tables actually matter. Which metrics drive decisions. What’s just sitting there. What we can archive without anyone noticing.
The old way I’d have done this:
- Pull a CSV export of all questions
- Spend an afternoon talking to team leads
- Sample dashboards, look at usage
- Cross-reference with the activity log manually
- Build a spreadsheet of what’s used vs unused
- Map each surviving question back to its source tables
- Try to spot patterns
A solid week of work. Maybe more, depending on how much hand-holding the team needs to schedule calls.
Here’s what I actually did:
Prompt 1: “Pull the activity log from the last 90 days. Rank all questions by view count. Group by collection. Show me the top 50 and the bottom 100.”
Claude called metabase_get_activity and metabase_list_cards through the MCP. Twelve seconds later I had a ranked list. Top 50 questions accounted for 78% of all views. The bottom 100 hadn’t been opened in over six months.
Prompt 2: “For the top 50 questions, list which database tables each one queries. Show me the most-used tables and which questions touch them.”
Another 8 seconds. Claude pulled the SQL and metadata for each top question, parsed out the source tables, and built a frequency map. Five tables accounted for 90% of usage. Three of those five had been flagged for deprecation in last quarter’s data engineering review. Nobody had told the BI team.
Prompt 3: “Identify duplicate questions, where two or more cards return effectively the same result with minor variations. Group them so I can pick canonical versions.”
This is the kind of work that’s brutal manually. Claude found 47 question pairs that were either exact duplicates or near-duplicates. It even noted which ones differed only by date range or a single filter. I picked canonicals in 20 minutes.
What used to be a week of work was effectively done in an afternoon. Not because Claude is magic, but because it can iterate on the data 100x faster than I can. Each prompt builds on the previous one, the model holds context, and the MCP gives it real numbers to work with.
There’s a subtler payoff I didn’t expect. Once Claude had context on the activity data, the prompts I could ask started getting better. By prompt 5, I was asking questions like “show me which questions are popular but built on tables flagged for deprecation, since those will break first when the migration happens.” That’s a specific operational question that would have taken half a day of careful spreadsheet work in the old workflow. Claude returned the answer in 9 seconds.
I want to be clear about what this is and isn’t. It isn’t replacing the data engineer who’s planning the table migration. The agent doesn’t know about the migration unless I tell it. It also doesn’t know which questions are politically important to which executive. The judgment calls about what to do with the data still sit with humans. But the part where you spend three days collecting the data so you can make those judgment calls? That’s gone.
There’s also a documentation byproduct nobody talks about. After I finished the audit, I asked Claude to write up what we found in a format the team could read. It generated a clean summary: top questions, recommended archives, deprecation risks, duplicates with canonical picks. The team had documentation of the audit that was actually accurate, which is more than I can say for most audit reports.
Want to run this on your own Metabase? The free setup guide walks through every step, including the exact prompts I used.
What the setup actually looks like
If you’re thinking “this sounds great in theory but I bet it’s a nightmare to install,” here’s the actual setup, end to end.
You’ll need three things: a running Metabase instance, Claude Code installed locally, and a Metabase API key.
The Metabase API key takes 30 seconds. Open your Metabase admin panel, go to Settings, then API Keys. Generate a new key, copy it somewhere safe. Done. If you’re on Metabase Cloud, this is built in. If you’re self-hosting, same flow.
The MCP server itself is a single npm package. One command installs it globally. Another command verifies the install. That’s the entire installation step.
Configuring Claude Code to use it is one block of JSON. You add an mcpServers entry in your Claude Code settings, point it at the metabase-mcp binary, and pass two environment variables: your Metabase URL and your API key. Save the file. Restart Claude Code.
The first time you launch Claude Code with the MCP active, you can verify it worked by asking “list available MCP tools.” You should see Metabase showing up with 24 tools available. If you see that, you’re done with setup.
The whole thing is a 30-minute job for someone who’s never touched MCP before. For someone who has, it’s closer to 10 minutes. The setup is genuinely the easy part. The interesting work is figuring out what to do with it.
I’ve put the entire setup, including troubleshooting for the common gotchas, in the free guide at valiotti.com/metabase-mcp-guide. If something doesn’t work, the guide has the answer. If it’s something the guide doesn’t cover, drop me a line and I’ll help.
SQL from plain English (and the card actually gets made)
Most people who’ve tried “AI for SQL” hit the same wall: the model writes the query, you copy it into your editor, fix the half-hallucinated column names, run it, paste the results back into the chat, ask for a tweak, paste again. After three rounds you wonder if writing it yourself would’ve been faster.
MCP changes the loop because Claude can read your schema before it writes anything.
Here’s a real example from another client. A non-technical product manager wanted to understand which onboarding cohorts retained best. She asked me. I asked Claude:
Show which user cohorts from Q3 had the best 90-day retention. Create a Metabase card in the “Retention Analysis” collection. Use a line chart, weekly cohorts on x, retention % on y.
Claude did this in sequence:
- Called
metabase_list_databasesandmetabase_list_tablesto find what existed - Inspected the
usersandeventstables to find the right joins - Wrote the cohort retention SQL using
signup_weekas the cohort identifier - Called
metabase_create_cardwith the right collection ID and visualization type - Ran the card to verify it returned sensible data
- Reported back: card #1847 saved, top cohort July week 2, 34.1% retention, 2.4× above average
I didn’t touch SQL once. The PM got her chart in the Metabase collection she already lives in.
The thing that makes this work is the schema introspection. Claude isn’t guessing column names from a half-remembered conversation about your data. It’s reading the actual table definitions before it writes the query. You still want to review the SQL on anything sensitive, but the hit rate on first-pass correctness is high enough that the workflow is genuinely faster than writing it yourself.
A more interesting variant of this is cross-table analysis. The kind of question that spans three tables and takes 20 minutes to think through. I asked Claude:
Compare conversion rates between users acquired through paid channels in Q1 vs Q2 of this year. Break it down by source, and flag any source where conversion dropped more than 10 points.
Claude joined users, attribution_events, and subscriptions. Wrote the query. Ran it. Came back with a small table and a flagged row: one specific paid source had dropped from 6.2% to 3.8% conversion between quarters. That’s a hypothesis you can act on, not just a number. And it took 14 seconds.
The trick that makes this work for non-technical users is the conversation flow. The PM in my earlier example didn’t write the perfect prompt on her first try. She wrote something vague like “I want to see retention by cohort.” Claude asked clarifying questions: which cohorts, how to define the cohort window, what retention metric, where to save the result. After two rounds of clarification, the request was specific enough to act on, and Claude built it. The model is doing the work that a good analyst does in a 1:1 meeting: turning a fuzzy business question into a specific data request.
For SQL writers who don’t trust the model on anything financial or compliance-related, there’s a middle path. Have Claude write the query but not run it. Review the SQL manually. Run it yourself. Then have Claude create the card with your verified query. You get the speed-up on the writing without giving up the verification step. After a few weeks of doing this, you’ll calibrate where you trust auto-execution and where you don’t.
Weekly KPI monitoring (the one I keep running)
Of everything I’ve built with the Metabase MCP, the one that’s still running on two clients is also the simplest: a weekly KPI digest.
The setup: every Monday morning at 9 AM, Claude Code runs a cron job. It fetches five core metric cards from Metabase via MCP, compares them to the previous week, and posts a summary to Slack if anything moved more than 15%.
Here’s the prompt I used to set it up:
Every Monday at 09:00, run cards #101, #102, #103, #104, #105. Compare results to the same cards from one week ago. If any metric moved more than 15% in either direction, post a summary to Slack channel #data-alerts. Flag what changed and what might be causing it. If nothing moved significantly, post a short “all metrics stable” message.
Claude set up the cron, captured baseline values for next week’s comparison, and confirmed the schedule. That was 20 minutes of work. It’s been running for three weeks now.
Two weeks ago it caught something useful. The Monday digest flagged a 22% spike in churn. The team wouldn’t have looked at the churn dashboard until Wednesday’s regular standup. By then they would’ve been three days late on investigating. Instead, by 9:15 AM Monday, the data lead was already pulling the cohort that drove the spike.
That’s not a “Claude saved the company” story. The team would’ve caught it eventually. But three days of head start matters when you’re trying to figure out whether a churn move is a fluke or a trend.
The reason this works is that the MCP gives Claude something concrete to do at a fixed cadence. It’s not a chatbot waiting for input. It’s an agent with a schedule, a job, and the tools to do it. Once you have that mental model, you stop thinking about Claude as “AI assistant” and start thinking about it as a junior analyst who never sleeps and never forgets to run the report.
The strategic question type
The most surprising use case I’ve found isn’t a query or a card or a digest. It’s asking Claude something open-ended and seeing what it does with the access it has.
I tried this with a founder I work with. He asked, half-joking:
What’s the biggest leak in our funnel right now?
I expected a generic answer. Instead, Claude:
- Looked at the activity dashboard to find which funnel was being most actively monitored
- Identified the conversion stages from the existing card definitions
- Pulled the last 90 days of data for each stage
- Computed stage-to-stage conversion rates
- Compared each stage’s conversion to the previous quarter’s average
- Flagged the biggest negative deviation: a 9-point drop at the trial-to-paid stage
The answer wasn’t “here’s a generic funnel framework.” It was “your trial-to-paid conversion has dropped 9 points compared to Q1, the drop accelerated in week 6, and the cohorts most affected are coming from your three biggest paid channels.”
That’s a strategy question with a strategy-level answer, backed by your actual data. The founder went from “I wonder how we’re doing” to “we have a specific problem in a specific cohort” in under a minute.
This is the moment where MCP starts to feel less like a productivity tool and more like an org-level shift. The latency between business questions and data-backed answers used to be measured in days. Now it’s seconds.
A second strategic example, because this is the use case I get the most questions about. A different client’s CEO asked: “are we still growing efficiently?” That’s a question with a dozen possible interpretations and no obvious starting card to look at. I passed it to Claude with the MCP active.
Claude looked through the company’s existing dashboards, identified which ones tracked growth metrics, and built a snapshot: new customers per month, revenue per customer, payback period, marketing efficiency ratio. It then compared each of these to the previous four quarters and flagged the trend on each one.
The answer it came back with was nuanced. New customer acquisition was up 14% quarter over quarter, which looked good. But payback period had stretched from 11 months to 14, and marketing efficiency was deteriorating. The conclusion: growth is happening, but the unit economics are softening. The CEO needed to know whether that was a deliberate trade for top-of-funnel growth or an unintended drift.
A human analyst could’ve produced the same answer. But it would’ve been a 2-hour meeting and a follow-up deck. Instead it was a single conversation in a terminal that took six minutes. The CEO had a real answer to a real question while the topic was still on his mind.
The pattern with strategic questions is that they require the agent to do three things in sequence: understand what the question is really asking, find the right data sources, and synthesize an answer that’s specific to the company’s actual numbers. None of these are individually impressive. But chaining them together with a fresh dataset every time is the unlock.
What MCP can’t do (yet)
I’m not going to pretend this is a finished product. Here’s what doesn’t work as well as you’d hope.
Long-running queries time out. If your dataset is large enough that a question takes 90+ seconds to run in Metabase, the MCP call will probably time out before you get the result. You can work around this by asking Claude to run a smaller version of the query first, but it’s a real limitation.
The model can’t see visualizations. Claude reads card definitions and query results as data, not images. If your insight depends on a chart pattern that’s hard to describe in numbers, you’ll lose something in translation. This is a fundamental limit of the current MCP, not a Metabase-specific issue.
Some Pro-only features aren’t fully exposed. Activity log access works on open-source Metabase. Usage analytics, subscription management, and some admin operations require Metabase Pro. The MCP server includes tools for those, but they’ll fail silently on OSS instances. Worth knowing before you architect a workflow around them.
Schema permissions matter. If your API key doesn’t have access to a database, Claude can’t see it either. This is good for security but can be confusing if you’re testing on a restricted account. For real work, give Claude an API key with appropriate read access. For destructive operations (creating cards, archiving questions), I recommend a separate key with a narrower scope, or human approval for anything that writes.
It can hallucinate column names. Even with schema introspection, the model occasionally invents a column that doesn’t exist. Always have Claude run the query before you trust the SQL. The MCP makes this easy because running and verifying is part of the same flow.
None of these are dealbreakers. They’re just the actual texture of working with this tool. If you go in expecting magic, you’ll be disappointed. If you go in expecting “useful junior analyst with API access”, you’ll be very happy.
Debugging when things go wrong
A few patterns I’ve hit and how to fix them, since these will probably bite you too.
Claude can’t find the MCP tools after restart. The most common cause is a typo in the JSON config. Claude Code is strict about JSON. Run your settings file through a JSON validator before restarting. The second most common cause is the API key being wrong or expired. Test it with a curl call against the Metabase API before assuming the MCP is the problem.
Tool calls succeed but return no data. Usually a permissions issue. The API key you’re using might have access to fewer collections or databases than your user account. Check the API key’s permissions in Metabase admin and make sure it can see what you’re asking Claude about.
Queries that work in Metabase fail through the MCP. Sometimes the MCP wrapper has stricter timeouts or different default parameters than running a query interactively. If a query is slow, it might time out at the MCP layer even though it would complete in the Metabase UI. The fix is to ask Claude to add aggressive filters or sample the data first, then expand if it works.
Claude makes up dashboards or cards that don’t exist. This happens occasionally when the model has a strong prior about what should exist. The fix is explicit: tell Claude to call metabase_list_dashboards first and only reference real items. Once you’ve added that to the prompt habit, it stops happening.
Auto-creation of cards in the wrong collection. If you don’t specify a collection, Claude defaults to the root or to whatever it sees first. Always pass the target collection name in your prompt. Claude will look up the collection ID and use it correctly.
These are the failure modes worth knowing about up front. None of them are blockers. They’re the kind of friction you stop noticing after a week of use.
The bigger picture
Here’s the thing I’ve been thinking about since I started using this.
The cycle of work in BI has been the same for a decade. Someone asks a question. Someone else opens a SQL editor. They write a query. They build a dashboard. The original asker looks at it. Asks a follow-up. The cycle repeats.
The bottleneck has never been data or compute. It’s been the human in the middle, translating questions into queries and back into answers. We’ve built whole industries around making this human faster: dbt, modern BI tools, semantic layers, metric stores. They all help. They all still leave the human as the bottleneck.
MCP changes the topology. The human still asks questions. The agent does the translation. The human reviews and acts. The middle tier collapses.
I don’t think this means data analysts disappear. It means the work changes. The valuable analyst stops being the SQL writer and becomes the architect of the system: designing the metric layer, defining what good data looks like, building the prompts and automations that the rest of the team uses. The mechanical work moves to the agent. The judgment work stays with the human.
This is what I’ve started calling the AI-native data stack. The components aren’t dramatically new: clean data warehouse, dbt or similar transformation layer, a BI tool with an MCP, an agent that can reach across all of them. What’s new is that the agent is a first-class citizen of the stack. You design for it from the start, not as an afterthought.
If you’re a Series A founder reading this and thinking “we don’t need an AI agent yet, we need a real data team first,” I get the instinct. But I’d push back. The companies setting up their data stack today have a small advantage that compounds over the next two years. They’re building infrastructure that an agent can use immediately, instead of building infrastructure first and bolting on an agent later. The bolting-on phase is more expensive than getting it right the first time.
If you’re an analyst reading this and worried about your job, my honest take is: nothing here threatens the people who can architect systems and ask hard questions. It threatens the role of “person who writes the third version of the same SQL query for the third stakeholder this week.” Those tasks are now agent work. The rest of analysis becomes more interesting.
There’s a related shift in how data teams should think about documentation. With an MCP-enabled agent, your dashboard descriptions, your collection structure, and your column comments aren’t just human-readable metadata. They’re context for the agent. A well-described dashboard is a dashboard the agent can use. A cryptic column name is a column the agent will guess wrong about. The investment in documentation that data teams have always under-prioritized starts paying off in a new way: it makes your agent smarter.
The same applies to your semantic layer. If you’ve built a metrics layer in dbt or LookML, the agent can use it. The metrics become callable tools instead of just dashboard sources. This is where the AI-native stack starts to feel coherent: clean source data flows into transformed models, those models expose metrics through a semantic layer, the BI tool sits on top of the metrics, and the agent reaches across all of it through MCPs. Every layer is built to be both human-readable and agent-readable.
I’m starting to use a simple heuristic when I evaluate data tooling for clients: can an MCP-enabled agent reach this? If yes, it’s compatible with the future stack. If no, it’s a candidate for replacement. Tools that don’t expose their state to agents are going to feel like islands in the next 18 months. The Metabase team understood this early, which is why their MCP server exists today instead of being on a 2027 roadmap.
What I’d actually recommend
If you have a Metabase instance and you’re curious, do this:
- Spend 30 minutes setting up the MCP server with Claude Code. The free guide walks through every step.
- Pick one question that’s been on your mental backlog. Something like “which dashboards do people actually use” or “which tables are queried most.” Run it through Claude.
- If that works, set up the weekly KPI digest. It’s the highest-leverage automation per minute spent.
- Then start experimenting with strategic questions. The kind you’d usually ask in a meeting and never get answered.
That’s the on-ramp. After about a week of this, you’ll have a feel for what the agent does well and where it falls down, and you can decide whether to invest more.
The free setup guide is here: valiotti.com/metabase-mcp-guide. It includes the exact prompts I used, the JSON config for Claude Code, and a list of starter prompts that work out of the box.
If you’d rather skip the DIY and have us configure it for your team, including custom prompts, automations, and a session to train your data team on how to use it: book a 30-minute call. Most engagements like this take a single working day to ship.
About the author

Nick Valiotti
Founder · Fractional CDO
Founder of Valiotti Analytics, a data agency working with venture-backed startups on data infrastructure, BI strategy, and AI-native data stacks. Works as a Fractional Chief Data Officer for a portfolio of Series A and B companies.


