Back to Research

Why MCP Is the Shadow IT of AI

April 24, 20265 min read
ai-securitymcpagent-securityoffensive-securityred-team

The pattern is familiar

In 2012 a finance team plugged a Dropbox account into their workflow because sending spreadsheets over email was painful. Nobody asked security. A year later the pattern repeated with Slack, then with Zoom, then with every SaaS tool that promised a faster Tuesday. IT called it shadow IT and spent a decade building programs to discover, classify, and control it.

The shadow IT of AI is already here. It's called MCP.

What MCP actually is

Model Context Protocol is the specification that lets a language model reach out and call tools. A tool can be anything: a filesystem, a database, an internal API, a GitHub repo, a Stripe account, a SOC dashboard, a production Kubernetes cluster. A server declares the tools it exposes, the agent discovers them, the model decides when to call them, and the results feed back into the model's context. It's clean. It's composable. It's quietly becoming the default integration layer for every serious agent product.

It's also, right now, almost entirely ungoverned.

Why the governance gap is big

The specification is young. The reference implementations are moving quickly. The servers being stood up inside product teams are being stood up by product teams. Not by security teams. Not by platform teams. By whoever had a Tuesday afternoon free and a use case that needed tools wired into an agent. The pattern is identical to the 2012 SaaS wave, except the stakes are different. A SaaS tool exfiltrates data. An MCP server that an agent can call hands the agent the keys to whatever the server can reach.

Common configurations I've seen in the wild:

  • MCP servers running as root because the shortest path to a working demo was to skip the privilege-drop code.
  • Long-lived personal access tokens baked into server env files and shared on Slack.
  • Tool definitions that don't match what the tool actually does, so the model ends up calling a "read_users" function that can also delete them.
  • Stdio transports exposed over a network because somebody wanted remote access and didn't realize stdio was never designed to be trusted across a wire.
  • Zero logging of which tool the model called, with what arguments, under whose authority.

None of that is theoretical. All of it ships in production today.

The attack surface is bigger than the tool list

An MCP server is not a single vulnerability. It's a chain. The model reads a tool definition and trusts it. A malicious definition can smuggle instructions into the model's context. The model invokes a tool and trusts its output. A malicious output can rewrite the model's plan. The agent executes a chain of tool calls and trusts that each call is bounded by the tool's documented surface. A tool with wider real effects than its description can escalate privileges two calls later.

Four classes of failure keep showing up:

  1. Tool-definition injection. The server's declared description contains prompt content that changes the model's behavior when the agent reads the server's manifest.
  2. Authorization bypass between tool calls. Permissions are checked at call time, but state from a prior call carries authority the current call should not have inherited.
  3. Context pollution through tool output. A tool returns content that contains instructions. The model treats those instructions as intent because there's nothing in the protocol separating data from instruction.
  4. Capability creep. A tool named read_file also writes, deletes, or network-requests. The model is operating on the name. The server is operating on the code.

Each of these is a design pattern, not an implementation bug. That's the part defenders need to sit with.

What responsible operators should be doing now

If you run MCP servers in production, or you're about to, the minimum is this. Inventory every server and every tool. Require written threat models before any tool that can read secrets or mutate state ships behind an agent. Add structured logging on every tool invocation, including the model that called it, the arguments it used, and the bytes it got back. Treat every tool manifest as untrusted input; scan it the way you scan user-submitted JSON.

Then the next layer. Separate agent identities from human identities; do not let an agent run as an engineer. Scope each tool to a specific purpose and deny-by-default everything else. Rotate any credential the server can touch, on a schedule, automatically. Decide what an agent is allowed to do without human approval and enforce it at the MCP client, not at the server.

None of this is exotic. All of it is absent from most deployments I see.

What Krypteia Sec is building

I'm writing krypteia-mcp-audit, an open-source CLI that takes a running MCP server and runs the attack patterns above against it. The first public release will cover tool-definition injection, authorization bypass between tool calls, context pollution via tool output, credential exfiltration patterns, and capability-creep detection. It ships as defensive tooling. You point it at servers you own or have written authorization to test. You get a report. You fix things before somebody else finds them.

If you want to be first in line to see it, the newsletter is where I'll announce the release.

The takeaway

MCP is the best integration story AI has produced. It's also the fastest way to give a language model effects in the real world with the thinnest security story anyone has shipped in a decade. Shadow IT took ten years to get under control. Shadow AI doesn't get ten years. It gets however long it takes the first public incident to change the conversation, and that clock is already running.

Test your servers. Audit your tools. Log every call. Write it down.

Then come back and read the next post, because the incidents are coming.