MCP tool poisoning: how a "joke_teller" becomes a data exfiltration agent

Researchers found a tool in the Model Context Protocol ecosystem with hidden instructions embedded in its description. The tool was called "joke_teller." It did not just tell jokes.

This is the current state of agentic AI security: the trust boundary is the tool description, and nobody is checking it.

How MCP tool poisoning works

The Model Context Protocol lets LLM-based agents connect to external tools: databases, APIs, code executors, file systems. When an agent loads an MCP tool, the tool's name, description, and parameter schema are injected into the model's context.

Tool poisoning attacks embed attacker instructions in that description. The instructions can be invisible to human review (using Unicode characters, whitespace tricks, or simply relying on the fact that nobody reads every tool description carefully). When the model processes the tool list, it reads the hidden instructions and follows them, because from the model's perspective, they are just part of the context it was given.

The attack does not require exploiting a software vulnerability. It requires publishing a malicious MCP tool, which any developer can do.

What the research shows

The MCPTox benchmark tested this attack class against real-world MCP servers and popular agent models. Attack success rates exceeded 60% across prominent models including o1-mini and DeepSeek-R1. Those are not obscure research models; they are the same models organizations are deploying in production agentic workflows today.

The consequences of a successful injection scale with what the agent can do. In a coding assistant with file system and network access, a successful tool poisoning attack could exfiltrate source code, credentials, and API keys. In an enterprise automation workflow, it could manipulate records, send messages, or initiate financial transactions.

This is not a hypothetical

The ClawHavoc campaign (January/February 2026) distributed 1,200 malicious tools to the OpenClaw marketplace. CVE-2026-25253 was the first CVE assigned to an agentic AI system, covering remote code execution in the OpenClaw skill runtime. The jailbreak-as-a-service market documented by Group-IB charges $50-200 per month for frameworks that systematically exploit these weaknesses.

The agentic attack surface is real, active, and commercially exploited.

What defenders can do

The fundamental problem is that agents trust tool metadata without validation. Until the ecosystem develops better controls, the mitigations are at the procurement and deployment level.

Before deploying any MCP tool: read its description in full, including any unusual Unicode characters or whitespace. Prefer tools from vendors with a security track record. Treat MCP tool installation with the same scrutiny as npm package installation (which itself remains a significant supply chain risk).

For agent platform developers: build description sanitization into the tool loading pipeline, flag tools with anomalous character sets, and require human approval before loading new tools into production agents.

Google's approach with Gemini 3.5 Flash is worth noting: they added optional enterprise safeguards that require explicit user confirmation for sensitive or irreversible actions and abort the task on detected indirect prompt injection. That model-level defense does not prevent tool poisoning, but it adds a speed bump that limits blast radius.

Gigia Tsiklauri is a Security Architect and founder of Infosec.ge. Get in touch if you are building or securing agentic AI workflows.

AI SecurityAnthropicDual Use

Anthropic Mythos and the AI Dual-Use Problem

For six weeks, the US government effectively banned Anthropic's most powerful model. Last week they cleared it for a small list of trusted defenders. This is the first time a commercial AI model has been put through an export-style security review, and the implications go beyond Anthropic.

July 1, 2026

AI Securityllm-securityAnthropic

Anthropic vs. Alibaba: What the largest known AI distillation attack tells us about model security

Anthropic says Alibaba used 25,000 fake accounts to run 28.8 million Claude interactions and train Qwen off the results. The White House named this attack class in presidential guidance. Here is what the case means for AI security practitioners.

June 30, 2026

AI SecurityAgentic AIllm-securityPrompt Injection

One in eight AI breaches now involves agentic systems: what the HiddenLayer 2026 report actually says

HiddenLayer's 2026 AI Threat Landscape Report puts a concrete number on what many security teams have been watching with unease: autonomous agentic AI systems now account for more than 1 in 8 reported AI security breaches. Prompt injection is present in 73% of production deployments. The attack surface expanded faster than the defenses.

June 29, 2026

MCP tool poisoning: how a joke_teller becomes a data exfiltration agent