Amit Kothari
Amit Kothari CEO of Tallyfy, AI advisor at Blue Sheen

An MCP server is unreviewed code with your file system in scope

In brief

Treat every MCP server as untrusted code that runs with the access your agent has, because that is what it is. Anthropic docs say the directory lists connectors but does not security-audit them. A registry of approved servers with nothing enforcing it is a memo. The control that binds is a managed allowlist matched by URL or command, never by name.

The short version

An MCP server is third-party code that runs with your agent's access to your files and network. The directory it came from is a listing, not an audit. So the governance job is the boring one: decide which servers may run, and enforce that decision somewhere a developer can't edit.

  • The tool description the model reads is attacker-controlled text the user never sees
  • A registry with nothing enforcing it is a policy document with no teeth
  • Enforce by server URL or command, never by name, because a user can rename anything
  • Sandboxing the agent doesn't sandbox the servers it calls

Start with the sentence the rest follows from. An MCP server is code you didn’t write, running with your agent’s access to your files, your shell, and your network. Not a plugin in a walled garden. A program, on your machine or one hop away, doing what programs do, on your behalf.

That isn’t me being dramatic. It’s Anthropic’s own position. Their docs say they review connectors against listing criteria for the directory but “doesn’t security-audit or manage any MCP server.” The directory is a phone book, not a background check.

So the CISO question isn’t “is MCP safe.” It’s “which servers may run here, and what stops anyone running the others.” That has a real answer, and most of this post is it.

The uncomfortable part comes first, though. The feature that makes MCP worth having is the same feature you’re defending against.

Treat every server as untrusted code

The instinct in a lot of shops is to treat MCP like an app store. Someone vetted it, it’s in the list, install away. Drop that instinct straightaway, because an MCP server isn’t sandboxed from you by a platform. It runs with whatever the agent can touch, which is usually whatever you can touch. A local stdio server is a process on your laptop reading your files. A remote one holds a token to act as you.

There’s a fair objection, and it turned up on Hacker News when this debate flared: an MCP server “is running code at user-level, it doesn’t need to trick an AI into reading SSH keys, it can just read the keys.” True.

It’s also an argument for least privilege, not against governance. “It’s just code running as you” is precisely why you don’t let arbitrary versions of it run as you. You don’t hand that power to any npm package or browser extension either. Or you shouldn’t, anyway.

So the posture is simple, if not easy. Assume an unreviewed server is hostile until someone has read it, pinned it, and decided it earns its access. Reading it means the code and the tool definitions. Not the README.

Why is composability the threat?

Here’s the twist that makes MCP different from “just another package.” Its best feature is servers composing, snapping together, one agent wielding many tools at once. That same composability is the attack surface, and the attack lives where nobody looks: the tool description.

When a server registers, it hands the model a description of each tool, and the model reads all of it. You don’t. Invariant Labs put it plainly in the original tool-poisoning writeup: “AI models see the complete tool descriptions, including hidden instructions, while users typically only see simplified versions in their UI.” A tool description is attacker-controlled text loaded straight into the model’s context. The user approving the tool sees a friendly one-liner. The model sees the rest, hidden instructions included.

It gets worse the moment you have more than one server, which is the entire point of MCP. Cross-server shadowing means the malicious server never has to be the one you call. It poisons the shared context so that when you use a trusted server, the agent quietly does the rogue one’s bidding. Invariant showed a rogue server that “inject[s] the agent’s behavior with respect to other servers.” Snap ten servers together for the convenience and you’ve let any one of them whisper to all the others.

Notice where Invariant’s fix lands: pin “the version of the MCP server and its tools… using a hash or checksum to verify the integrity of the tool description.” Real control here is cryptographic. It is not a name on a list, which matters more than it sounds in a minute.

A registry without enforcement is a memo

Most enterprises, told to govern MCP, build a wiki page of approved servers and call it done. That’s the trap.

As one MCP-governance team put it, “a private MCP registry on its own is just a list. Without something enforcing it, developers can still configure their own MCP connections, agents can still call unapproved servers, and your list becomes a policy document with no teeth.” A memo isn’t a control. The developer who adds a server edits a JSON file on their own machine, and your wiki has no idea it happened.

Here’s the part that catches network teams. Your perimeter doesn’t help either. A local stdio MCP server runs on the developer’s laptop and talks to the agent over standard input and output, so it never crosses the firewall you spent a fortune on. You can’t block at the perimeter a thing that never reaches the perimeter.

Governing MCP is an endpoint-policy problem wearing a network-security costume.

What actually binds

Anthropic ships real enforcement, and it’s more capable than the wiki. You deploy a managed file or managed settings the developer can’t override, and Claude Code refuses to load anything outside it. Two patterns do the work. Drop a managed-mcp.json at a system path through your MDM and Claude Code loads only those servers, refusing any addition with a hard enterprise MCP configuration is active error before it even contacts the server. Or set allowedMcpServers plus allowManagedMcpServersOnly: true in managed settings, and only your allowlist is honored. Either way the policy lives where a developer can’t reach it. That’s the whole game: enforcement in a place the user can’t edit.

Now the detail that decides whether any of it is real. Match servers by URL or command, never by name.

Anthropic spells it out: a name entry “is not a security control. The name is the label a user assigns… so a user can call any server github.” Allowlist by serverName and a user points that name at anything they fancy.

Allowlist by serverUrl or serverCommand and the policy actually bites.

One asymmetry is worth committing to memory, because it’s the line between a soft control and a hard one. Denylists merge from every source, so a block always sticks. Allowlists merge too, your users’ own settings included, unless you set allowManagedMcpServersOnly. Forget that flag and your “allowlist” is a suggestion a developer can quietly widen.

Deny by default. Allow a vetted few by URL or command. Lock it to managed settings, and push the file with the same MDM that pushes everything else.

Boring is the goal.

The sandbox you thought you had

One last gap, because it undoes a control teams are proud of. You put Claude in a sandbox, you feel safer, fair enough. But the sandbox holds the agent, not the servers it calls. A remote MCP server runs on its own infrastructure with its own token, so an approved-but-compromised one reaches straight back out past your tidy box. Sandboxing the agent does nothing about what its tools do once it invokes them.

This is the same shape as every other control in the floor, and it’s worth saying out loud. The instinct, build a registry, draw a perimeter, drop the agent in a box, keeps guarding the place the risk isn’t. MCP’s risk is composition. Untrusted code, your access, many tools murmuring to each other. You govern that at the endpoint, with enforced allowlists matched by URL or command, with vetting that reads the code and pins the version, and with the assumption that “it’s in the directory” means nothing.

I’ve watched a team wire up a server with broad file access in about thirty seconds because it had a tidy landing page and a one-click install. Nobody read it. The directory said it existed, and somewhere between the demo and production “exists” got quietly upgraded to “approved.”

That upgrade is the whole vulnerability. Make someone earn it.

Once the floor is laid, the rest of enterprise Claude Code security is a design problem, and this allowlist is one stone in the phase-zero floor the whole rollout stands on. Before you build a server, it’s worth knowing what one actually costs to run. But the servers you install, not the ones you build, are where your file system is in scope. Treat them that way.

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

Related Posts

View All Posts »
Your Claude Code deny rules are not a security boundary

Your Claude Code deny rules are not a security boundary

Before you hand Claude Code to hundreds of people you add deny rules for .env and credentials and feel locked down. You are not. Those rules govern Claude own tools, not a Python one-liner that opens the same file, and the control that actually holds, the OS sandbox, reads your whole machine by default and fails open when it cannot start. The baseline worth setting is real. Its dangerous gaps are the defaults you never changed.

Your locked-down Claude sandbox is a holding pattern, not a destination

Your locked-down Claude sandbox is a holding pattern, not a destination

Giving everyone Claude inside an isolated VM, no sensitive data allowed, feels like the safe way to start. It is a fine way to start. The trouble is what happens when you leave people there: the leak it was built to stop walks out by copy-paste anyway, the friction recruits the shadow AI you were trying to prevent, and the value never compounds because nothing in an ephemeral box survives the session. A sandbox is a scaffold. Scaffolds come down.

Blocking the personal Claude account is an identity problem, not a network one

Blocking the personal Claude account is an identity problem, not a network one

Your CISO trusts the control posture Microsoft gives Copilot. To get Claude to the same bar, do not reach for tenant restrictions: that header only fires on your network, so it is theater the moment a laptop goes off-VPN. The control that holds lives at identity. Enforce SSO, then claim your domain, and know that the claim is a one-way door.

You are at phase zero, and the deck you were sold starts at phase three

You are at phase zero, and the deck you were sold starts at phase three

Every enterprise AI maturity model starts a rung above where most companies stand and skips the one that holds the rest up: getting the tool safely into people hands. Your team already has Claude. If IT cannot produce the tenant policy, the egress allowlist, the tool allowlist, and the audit log, you are at phase zero, whatever the deck says.

Claude is allowed in regulated finance, but it has no EU data residency

Claude is allowed in regulated finance, but it has no EU data residency

Two objections kill most regulated-finance AI conversations before they start. The first, that Anthropic does not permit Claude for regulated work, is false: Claude for Financial Services exists, banks run it, and the usage policy names finance high-risk, not forbidden. The second is real and almost nobody states it plainly: first-party Claude Enterprise has no EU data residency at all. There is no "eu" inference region and workspace storage is US-only. If you are FCA-regulated, that is the fact to design around, and the only EU route runs through a hyperscaler.

Claude Code behind a TLS-inspecting proxy: configure the tool, not the proxy

Claude Code behind a TLS-inspecting proxy: configure the tool, not the proxy

Locked-down shops reach for a proxy exception to make Claude Code connect. Wrong move, and it fails anyway. Claude Code does not pin certificates, so it works through full TLS inspection once you teach it to trust your corporate root CA. The fix is a couple of environment variables and an egress allowlist, not a hole in the proxy.

AI advisory services via Blue Sheen.
Contact me Follow 10k+