Poison the Toolbox: MCP Tool Poisoning and Rug-Pull

The Play

MCP hands the model the full text of every tool: name, description, parameter docs, and metadata. The user usually sees only a short label in the client UI. That asymmetry is the trust boundary you abuse. Whatever sits in a tool description is read by the agent as instruction, so a server author (or anyone who can edit the server's responses) can write directions the model follows but the operator never sees. Two variants matter. Tool poisoning plants the hidden instruction at connect time. The rug-pull flips a server that looked clean during approval to malicious afterward, because most clients trust a description once and never re-check it. Tool shadowing is the same idea aimed sideways: a malicious server's description rewrites how the agent uses a different, trusted server's tools. It works because the description is treated as documentation when it is really a privileged input channel into the model's context.

Before the Snap

Stand up DVMCP locally in an owned lab (Docker or a Python venv per the repo), pointed only at a throwaway client. No remote, no production agent, no shared host. Connect a disposable MCP client (an Inspector or a sandboxed agent) so you can read raw tool descriptions and watch the agent's behavior. Confirm scope: this is your machine and your lab, the only sanctioned target. Install mcp-scan / Snyk Agent Scan in the same environment so you can run the attacker and defender sides back to back. Read the Invariant Labs tool poisoning writeup first so you know the shape of what you are looking for.

Run It

Enumerate the surface: connect your disposable client to each DVMCP challenge server and pull the RAW tool list, full descriptions and parameter docs and metadata, not the client's pretty summary.
Diff what the model sees against what the user sees: line up the raw description against the UI label and note any text present in the model's view but absent from the operator's view.
Identify the instruction channel: for Challenge 2, locate the description that reads as documentation but is phrased as a directive to the agent, and reason about what action it steers without running it against anything real.
Observe the rug-pull window: for Challenge 4, approve a benign-looking server, then re-pull its tool descriptions after the server updates and watch for a description that changed post-approval with no re-consent.
Map the shadowing path: for Challenge 5, connect the malicious server alongside a trusted one and trace how the malicious description tries to redirect the agent's use of the trusted server's tools.
Confirm impact in the lab, not in prose: watch the sandboxed agent's tool-call sequence to see whether the poisoned description actually bends behavior, and stop at the proof, do not exfiltrate.
Switch to defender: run mcp-scan / Agent Scan against the same servers and confirm it flags the poisoning, shadowing, and changed descriptions you found by hand.
Record the delta: note which attacks the scanner caught, which it missed, and where pinning a description hash would have closed the rug-pull window.

What You Learn

The failure class is treating a tool description as documentation when it is an untrusted, model-read instruction channel inside the supply chain. Trust granted once at connect time does not survive a server that can change its own descriptions later. Any input the model reads as text can carry directives, and the operator's UI summary is not the model's reality. This is the same indirect-injection lesson as poisoned web content, moved into the agent's own toolbox.

Drive It with Claude Code

Against the authorized MCP server registered in scope for this engagement, run mcp-scan over its connected tool definitions and report every tool whose description carries hidden instructions, out of band directives, or unicode that the model would read but a human would miss. Triage each finding to LLM01 or LLM03 and tell me which tools an agent could be steered by, no payloads.

# mcp-scan: static scan of a connected MCP server's tool descriptions
# https://github.com/invariantlabs-ai/mcp-scan
# Authorized, in-scope servers only. Scanning a config can start the servers it names.
 
uvx mcp-scan@latest scan ~/.config/authorized-range/mcp.json --json > mcp-scan-report.json
 
# Findings of interest: tool descriptions carrying hidden or out-of-band
# instructions (tool poisoning), mapped to LLM01 / LLM03.
# Review before connecting an agent. Do not auto-trust unscanned tools.

Defend It

Treat every MCP server as an untrusted dependency. Scan before connect: run mcp-scan / Snyk Agent Scan over your configured servers to flag tool poisoning, shadowing, and toxic flows in descriptions. Pin and verify: hash tool descriptions at approval and re-check on every connect, so a rug-pull (a description that changed after consent) fails closed instead of silently re-trusting. Runtime-monitor: keep a proxy or guardrail watching tool calls and tool-description changes during a session, and re-prompt for consent when a description mutates. Pin server versions and sources, prefer signed or first-party servers, and isolate untrusted servers from credential-bearing tools so a poisoned description cannot reach secrets. Show operators the full description the model sees, not a trimmed label, to close the asymmetry the attack depends on.