The short version
An MCP server is third-party code that runs with your agent's access to your files and network. The directory it came from is a listing, not an audit. So the governance job is the boring one: decide which servers may run, and enforce that decision somewhere a developer can't edit.
- The tool description the model reads is attacker-controlled text the user never sees
- A registry with nothing enforcing it is a policy document with no teeth
- Enforce by server URL or command, never by name, because a user can rename anything
- Sandboxing the agent doesn't sandbox the servers it calls
Start with the sentence the rest follows from. An MCP server is code you didn’t write, running with your agent’s access to your files, your shell, and your network. Not a plugin in a walled garden. A program, on your machine or one hop away, doing what programs do, on your behalf.
That isn’t me being dramatic. It’s Anthropic’s own position. Their docs say they review connectors against listing criteria for the directory but “doesn’t security-audit or manage any MCP server.” The directory is a phone book, not a background check.
So the CISO question isn’t “is MCP safe.” It’s “which servers may run here, and what stops anyone running the others.” That has a real answer, and most of this post is it.
The uncomfortable part comes first, though. The feature that makes MCP worth having is the same feature you’re defending against.
Treat every server as untrusted code
The instinct in a lot of shops is to treat MCP like an app store. Someone vetted it, it’s in the list, install away. Drop that instinct straightaway, because an MCP server isn’t sandboxed from you by a platform. It runs with whatever the agent can touch, which is usually whatever you can touch. A local stdio server is a process on your laptop reading your files. A remote one holds a token to act as you.
There’s a fair objection, and it turned up on Hacker News when this debate flared: an MCP server “is running code at user-level, it doesn’t need to trick an AI into reading SSH keys, it can just read the keys.” True.
It’s also an argument for least privilege, not against governance. “It’s just code running as you” is precisely why you don’t let arbitrary versions of it run as you. You don’t hand that power to any npm package or browser extension either. Or you shouldn’t, anyway.
So the posture is simple, if not easy. Assume an unreviewed server is hostile until someone has read it, pinned it, and decided it earns its access. Reading it means the code and the tool definitions. Not the README.
Why is composability the threat?
Here’s the twist that makes MCP different from “just another package.” Its best feature is servers composing, snapping together, one agent wielding many tools at once. That same composability is the attack surface, and the attack lives where nobody looks: the tool description.
When a server registers, it hands the model a description of each tool, and the model reads all of it. You don’t. Invariant Labs put it plainly in the original tool-poisoning writeup: “AI models see the complete tool descriptions, including hidden instructions, while users typically only see simplified versions in their UI.” A tool description is attacker-controlled text loaded straight into the model’s context. The user approving the tool sees a friendly one-liner. The model sees the rest, hidden instructions included.
It gets worse the moment you have more than one server, which is the entire point of MCP. Cross-server shadowing means the malicious server never has to be the one you call. It poisons the shared context so that when you use a trusted server, the agent quietly does the rogue one’s bidding. Invariant showed a rogue server that “inject[s] the agent’s behavior with respect to other servers.” Snap ten servers together for the convenience and you’ve let any one of them whisper to all the others.
Notice where Invariant’s fix lands: pin “the version of the MCP server and its tools… using a hash or checksum to verify the integrity of the tool description.” Real control here is cryptographic. It is not a name on a list, which matters more than it sounds in a minute.
A registry without enforcement is a memo
Most enterprises, told to govern MCP, build a wiki page of approved servers and call it done. That’s the trap.
As one MCP-governance team put it, “a private MCP registry on its own is just a list. Without something enforcing it, developers can still configure their own MCP connections, agents can still call unapproved servers, and your list becomes a policy document with no teeth.” A memo isn’t a control. The developer who adds a server edits a JSON file on their own machine, and your wiki has no idea it happened.
Here’s the part that catches network teams. Your perimeter doesn’t help either. A local stdio MCP server runs on the developer’s laptop and talks to the agent over standard input and output, so it never crosses the firewall you spent a fortune on. You can’t block at the perimeter a thing that never reaches the perimeter.
Governing MCP is an endpoint-policy problem wearing a network-security costume.
What actually binds
Anthropic ships real enforcement, and it’s more capable than the wiki. You deploy a managed file or managed settings the developer can’t override, and Claude Code refuses to load anything outside it. Two patterns do the work. Drop a managed-mcp.json at a system path through your MDM and Claude Code loads only those servers, refusing any addition with a hard enterprise MCP configuration is active error before it even contacts the server. Or set allowedMcpServers plus allowManagedMcpServersOnly: true in managed settings, and only your allowlist is honored. Either way the policy lives where a developer can’t reach it. That’s the whole game: enforcement in a place the user can’t edit.
Now the detail that decides whether any of it is real. Match servers by URL or command, never by name.
Anthropic spells it out: a name entry “is not a security control. The name is the label a user assigns… so a user can call any server github.” Allowlist by serverName and a user points that name at anything they fancy.
Allowlist by serverUrl or serverCommand and the policy actually bites.
One asymmetry is worth committing to memory, because it’s the line between a soft control and a hard one. Denylists merge from every source, so a block always sticks. Allowlists merge too, your users’ own settings included, unless you set allowManagedMcpServersOnly. Forget that flag and your “allowlist” is a suggestion a developer can quietly widen.
Deny by default. Allow a vetted few by URL or command. Lock it to managed settings, and push the file with the same MDM that pushes everything else.
Boring is the goal.
The sandbox you thought you had
One last gap, because it undoes a control teams are proud of. You put Claude in a sandbox, you feel safer, fair enough. But the sandbox holds the agent, not the servers it calls. A remote MCP server runs on its own infrastructure with its own token, so an approved-but-compromised one reaches straight back out past your tidy box. Sandboxing the agent does nothing about what its tools do once it invokes them.
This is the same shape as every other control in the floor, and it’s worth saying out loud. The instinct, build a registry, draw a perimeter, drop the agent in a box, keeps guarding the place the risk isn’t. MCP’s risk is composition. Untrusted code, your access, many tools murmuring to each other. You govern that at the endpoint, with enforced allowlists matched by URL or command, with vetting that reads the code and pins the version, and with the assumption that “it’s in the directory” means nothing.
I’ve watched a team wire up a server with broad file access in about thirty seconds because it had a tidy landing page and a one-click install. Nobody read it. The directory said it existed, and somewhere between the demo and production “exists” got quietly upgraded to “approved.”
That upgrade is the whole vulnerability. Make someone earn it.
Once the floor is laid, the rest of enterprise Claude Code security is a design problem, and this allowlist is one stone in the phase-zero floor the whole rollout stands on. Before you build a server, it’s worth knowing what one actually costs to run. But the servers you install, not the ones you build, are where your file system is in scope. Treat them that way.





