MCP tool poisoning: how a "joke_teller" becomes a data exfiltration agent
Researchers found a tool in the Model Context Protocol ecosystem with hidden instructions embedded in its description. The tool was called "joke_teller." It did not just tell jokes.
This is the current state of agentic AI security: the trust boundary is the tool description, and nobody is checking it.
How MCP tool poisoning works
The Model Context Protocol lets LLM-based agents connect to external tools: databases, APIs, code executors, file systems. When an agent loads an MCP tool, the tool's name, description, and parameter schema are injected into the model's context.
Tool poisoning attacks embed attacker instructions in that description. The instructions can be invisible to human review (using Unicode characters, whitespace tricks, or simply relying on the fact that nobody reads every tool description carefully). When the model processes the tool list, it reads the hidden instructions and follows them, because from the model's perspective, they are just part of the context it was given.
The attack does not require exploiting a software vulnerability. It requires publishing a malicious MCP tool, which any developer can do.
What the research shows
The MCPTox benchmark tested this attack class against real-world MCP servers and popular agent models. Attack success rates exceeded 60% across prominent models including o1-mini and DeepSeek-R1. Those are not obscure research models; they are the same models organizations are deploying in production agentic workflows today.
The consequences of a successful injection scale with what the agent can do. In a coding assistant with file system and network access, a successful tool poisoning attack could exfiltrate source code, credentials, and API keys. In an enterprise automation workflow, it could manipulate records, send messages, or initiate financial transactions.
This is not a hypothetical
The ClawHavoc campaign (January/February 2026) distributed 1,200 malicious tools to the OpenClaw marketplace. CVE-2026-25253 was the first CVE assigned to an agentic AI system, covering remote code execution in the OpenClaw skill runtime. The jailbreak-as-a-service market documented by Group-IB charges $50-200 per month for frameworks that systematically exploit these weaknesses.
The agentic attack surface is real, active, and commercially exploited.
What defenders can do
The fundamental problem is that agents trust tool metadata without validation. Until the ecosystem develops better controls, the mitigations are at the procurement and deployment level.
Before deploying any MCP tool: read its description in full, including any unusual Unicode characters or whitespace. Prefer tools from vendors with a security track record. Treat MCP tool installation with the same scrutiny as npm package installation (which itself remains a significant supply chain risk).
For agent platform developers: build description sanitization into the tool loading pipeline, flag tools with anomalous character sets, and require human approval before loading new tools into production agents.
Google's approach with Gemini 3.5 Flash is worth noting: they added optional enterprise safeguards that require explicit user confirmation for sensitive or irreversible actions and abort the task on detected indirect prompt injection. That model-level defense does not prevent tool poisoning, but it adds a speed bump that limits blast radius.
Gigia Tsiklauri is a Security Architect and founder of Infosec.ge. Get in touch if you are building or securing agentic AI workflows.