Data Engineering

How to Choose an MCP Server: A Practical Field Guide

· 27 min read

A year ago there were maybe twenty MCP servers worth knowing about. Today the public catalogs list well over a thousand, and the registry pages have a habit of growing while you’re reading them. Search “Postgres” on pulseMCP and you get a hundred-plus matches. Search “Slack” on mcpservers.org and the second page still hasn’t run out. The selection problem stopped being “is there an MCP server for X” and became “which of these forty-three MCP servers for X is the one that won’t bite me three weeks in.”

In This Article

  1. What MCP actually does, in two paragraphs
  2. Three principles that keep the search short
  3. Deployment: local, vendor-managed, third-party-hosted
  4. Functionality: narrow vs broad
  5. Where to actually find them
  6. A checklist for the selection meeting
  7. In closing
  8. Further reading

This is a field guide for picking one. The criteria are boring on purpose. The point isn’t to be clever about it; the point is to not regret the choice when the production traffic shows up.

What MCP actually does, in two paragraphs

If you’ve been writing prompts at ChatGPT or Claude in a browser, you already know the ceiling. However careful the prompt, the model sees only what you pasted into the chat. It can’t read your warehouse, it can’t look at the ticket history, it can’t open a file the rest of your team’s working in. Anything it produces, you copy back out by hand. For one-off snippets that’s fine. For real work it’s a tax that scales linearly with how much the model is supposed to help.

MCP, the Model Context Protocol that Anthropic published in late 2024, is the cable between the model and your systems. Once an MCP server is wired up, the model can query your Postgres, list your Slack channels, push a branch, read a Dataform compile error. The human stays in the loop with permission gates; the model stops being a chat partner you copy-paste against and starts being a teammate with credentials. Concretely, this is what the workflow looks like once MCP is in place:

  • Connect to a database. The model writes SQL with the actual schema in front of it, runs the query, reads the result, iterates.
  • Power a chatbot’s reasoning over your real product data instead of a generic knowledge cutoff.
  • Automate the routine layer, updating a row in a Google Sheet, opening a Linear ticket, scraping a known page, posting a digest.

Some teams hand the model creative work; others give it the chores nobody on the team wants. Both are fine. The MCP layer is what makes either of them stop being a copy-paste tax.

Three principles that keep the search short

Anyone with a weekend can publish an MCP server. After a year of MCP being public, plenty of people have. The catalogs are dense and the quality varies wildly. Before you start clicking, hold three rules in your head.

Prefer the official server. If Postgres maintains its own MCP server, start there even if a third-party version looks more polished today. Official servers get patched when the protocol shifts, get reviewed by people whose job is to not leak the company’s customer data, and don’t quietly vanish when one maintainer loses interest. The third-party server might be excellent. It might also be a thin wrapper that one engineer wrote on a Friday afternoon and never touched again. Default to official; deviate only when you’ve inspected the alternative carefully.

Pick a deployment mode early. Local versus hosted is the second decision you should make, and it’s a decision about who controls the data, not about convenience. The next section walks through the tradeoff.

Define the job before you pick the tool. The single largest waste in MCP selection is teams reading the feature matrix and picking the server with the most checkmarks. Most of those checkmarks are features you will never use. Write down the actual job, in one sentence, and let that sentence eliminate ninety percent of the catalog.

Decision matrix for MCP deployment
Decision matrix: four MCP server criteria (data sensitivity, vendor lock-in tolerance, feature scope, team’s ops capacity) on the rows; three deployment categories on the columns; each cell shows fit at a glance

Deployment: local, vendor-managed, third-party-hosted

The deployment question splits cleanly into three buckets. Each has a default audience.

Local (self-hosted)

The MCP server runs on your infrastructure. Your laptop, your container, your VPC. The connection to the model still leaves your network, so this isn’t air-gapped, but the server, the credentials, the query logs, all of it sits inside your perimeter.

This is the default for anything touching regulated data. HIPAA-bound clinical data, PCI cardholder data, anything covered by an NDA that named cloud vendors don’t satisfy. It’s also the default if your security team needs an audit trail they own. The cost is operational: someone on your team has to keep the server running, watch for protocol updates, rotate credentials, handle the on-call when something falls over at 2 AM.

Vendor-managed (Microsoft, Google, Amazon, Databricks, and so on)

The hyperscalers and the warehouse vendors ship their own MCP servers as managed services. You point a client at the endpoint, you authenticate, and the server is somebody else’s operational problem. AWS, Azure, GCP, Databricks, Snowflake, all of them now have first-party MCP servers for the parts of their platform people most want to query.

This is the right default if you’re already deeply on one cloud. You’re already paying that vendor, you’ve already passed legal review on their data handling, and the integration is going to be tighter than anything a third party could ship. The catch is exactly what you’d expect: you lock in further to the platform you’re already on, and your security team still has to bless the specific service even if the vendor as a whole is approved. Some compliance reviews will draw the line at “we don’t send query payloads outside the perimeter,” and that’s where a managed MCP server stops being an option.

Third-party-hosted (mcphosting.io and similar)

If you can’t run it locally and you don’t want the lock-in of a managed vendor service, third-party MCP hosting is the middle ground. Services like mcphosting.io will run open-source MCP servers for you on infrastructure tuned for the protocol. You get the ops convenience of a managed service without the platform tie-in.

The audience is teams who are MCP-curious, ops-poor, and not under the kind of compliance regime that requires on-prem. Startups, small consultancies, internal tools at companies whose security team is willing to sign off on a SOC 2 vendor. The audience is not regulated industries, and it’s not Fortune 500 procurement.

MCP server comparison
Six named MCP servers compared on five dimensions: official-source, deployment-modes, feature-scope, governance-fit, ease-of-setup. Visual scorecard, not a wall of text

Functionality: narrow vs broad

Every MCP server exposes a set of tools, the named operations the model can call through the protocol. A Postgres MCP server might expose list_tables, describe_table, execute_query. A Slack MCP server exposes list_channels, post_message, search. The set is the surface area, and the surface area determines both what the model can do and what your security review has to cover.

In a single category, the surface area can vary by an order of magnitude. Compare two database-focused servers:

  • MCP Toolbox for Databases, Google’s first-party server, supports the major SQL engines and several NoSQL stores, translates natural-language questions to SQL with schema awareness, generates DDL, creates tables and indexes. It’s a Swiss Army knife.
  • MCP Database Server is a read-only client for Postgres, MySQL, and SQLite. It lists tables, reads schemas, runs SELECT statements. That’s the entire surface.

Both are valid choices for “give the model database access.” They’re for different jobs. The Toolbox is for teams who want the model to participate in the data lifecycle, pull data, but also help design schemas and seed indexes. The narrower Database Server is for teams who want the model to read the warehouse and nothing else, with the smallest possible blast radius if the model goes off the rails.

There’s a third category: aggregator servers like Lasso Security MCP Gateway that proxy several underlying MCP servers behind one endpoint. These are appealing when you’re standing up MCP across many teams and want one place to apply policy, one set of credentials to manage, one upgrade path. They’re a heavier commitment, and they’re typically the wrong place to start.

The temptation is to default to “more is better”, pick the feature-rich server, you might want those tools later. In my experience this is usually wrong. The richer the server, the harder the install, the more your security review has to cover, the more ways the model can surprise you. If you’re working with Postgres and your job is occasional read-only extracts, the right server is the one that does exactly that, well, and stops there. You can always swap in a richer server later, when you actually need its features.

The reverse is also a mistake. Pulling together five narrow servers for a single workflow because you didn’t want to commit to a multi-tool server is a different kind of overhead. Five servers to install, five to update, five sets of credentials, five logs to read. Match the server’s scope to the work you’re doing now, not to the work you might be doing in a year and not to your fear of commitment.

Where to actually find them

The discoverability problem is real. Two starting points worth bookmarking:

On GitHub:

Aggregator catalogs:

  • MCP Server Finder, searchable index with categories and reviews.
  • mcp.so, straightforward catalog, easy to scan.
  • pulseMCP, the largest of the catalogs by raw count, well organized.
  • mcpservers.org, leaner, easier to filter than pulseMCP if you know what you want.

I usually start with the official-vendor index when the job involves a major platform (Postgres, AWS, GitHub, Slack), and one of the aggregators when the job involves something more obscure. The aggregators are also where you’ll find the rough edges first, community reviews, last-updated dates, install issues. Worth the five minutes of reading before committing.

A checklist for the selection meeting

If you’re about to pick an MCP server for a team, here’s the short list of questions to walk through before you commit:

  • What’s the single job you need the model to do, in one sentence, this quarter?
  • Does a first-party official server exist for the target system? If yes, why are you considering anything else?
  • Is the data the model will touch subject to compliance constraints that require local deployment?
  • If you’re going managed, is the vendor already approved by your security team, or does this kick off a new review cycle?
  • How many MCP tools does the server expose? If the count is much larger than the tools you need this quarter, is the overhead worth it?
  • What’s the protocol-version compatibility story? MCP is still moving fast; servers that haven’t been updated in six months are at risk.
  • Who on your team is on call if the server breaks at 2 AM? If the answer is “nobody,” local is the wrong deployment mode.

Most of these are five-minute answers. The point of writing them down is that any one of them, answered honestly, will usually eliminate enough candidates to make the final decision easy.

In closing

The MCP layer is the thing that turns a chatbot into a coworker. The selection problem is real and gets worse every month as the catalogs grow, but the decisions are not actually hard once you frame them in the right order: pick the official source if there is one, pick the deployment mode that matches your data sensitivity and your ops capacity, pick the smallest functional scope that does the job. Most teams who get burned on MCP got burned on the second or third decision, not the first. The model is only as useful as the connection you give it; spend the hour up front and the connection will keep paying back for the rest of the project.

If you’d like more on how the MCP layer actually changes the rhythm of a data-engineering session, the BigQuery + Claude case study is the companion piece to this one. And the Telegram channel where I write more about this kind of thing is Коля Валиотти • Дата консалтинг.

Further reading

Keep reading

Enjoyed this article?

Get weekly data strategy insights delivered to your inbox.

Get in Touch

Let's Discuss Your Project

Book a 30-minute discovery call. We'll assess your data maturity and recommend the right approach — no strings attached.

Book a Discovery Call →
Need help with your data strategy? Book a Discovery Call →