"comment and control" is the first prompt-injection finding a vendor's own system card predicted. read your system cards.
aonan guan, with johns hopkins researchers zhengyu liu and gavin zhong, published a prompt-injection disclosure last week called comment and control. the headline finding is that the same family of prompt-injection payloads compromises three flagship AI coding agents simultaneously, through ordinary github surfaces, with no external infrastructure required. the affected products: anthropic's claude code security review (a github action), google's gemini CLI action, and github's copilot agent.
the technical details are interesting. the bounty economics are interesting. but the part of this story that should change how defenders work is buried in anthropic's own system card for claude code, where they wrote, before this disclosure, that the security review action is "not hardened against prompt injection."
the vendor told you. the researchers built the proof of concept. the failure mode landed exactly where the system card said it would.
let me walk through what the disclosure actually shows, then come back to the system-card argument.
the three platforms, three surfaces, one attack family
claude code security review uses a github action that runs claude against pull request content to flag security issues. the attack: a crafted PR title contains a prompt-injection payload. when the action runs, claude reads the PR title as part of the context, follows the injected instructions, and executes attacker-chosen commands. the credentials harvested by those commands are then surfaced as a "security finding" in the action output, or written to the github actions log where the attacker can read them via a public PR.
gemini CLI action runs gemini against issues. the attack: an issue with a prompt-injection title plus crafted body comments. gemini reads the comments, the guardrails fail, and an API key is exfiltrated.
github copilot agent runs against repository contents broadly. the attack: an HTML comment hides the prompt-injection payload (the HTML comment renders as nothing in the github UI but still appears in the raw markdown that the agent reads). copilot processes the hidden comment, scans for secrets, and bypasses the network firewall that copilot's agent context is supposed to enforce.
three platforms. three different injection surfaces. one common attack family. no external infrastructure. a pull request to a public repo is the entire prerequisite.
the bounty economics
anthropic graded the finding as CVSS 9.4 critical. the bounty payment was $100.
google paid $1,337 (a hacker-culture nod, but functionally a token amount for a critical finding in a flagship product).
github paid $500 through the copilot bounty program.
for context: a typical critical bug bounty in a non-AI product at any of these companies pays $5,000 to $50,000. the sum of all three bounties for comment and control is less than the standard floor for a single critical finding in microsoft's office product line.
i don't think these numbers reflect the actual severity assessment internally at the three companies. the CVSS grading and the public response both indicate the security teams understood what they were looking at. the bounty programs themselves are calibrated to a different threat model, and AI security findings are landing in those programs while the calibration hasn't caught up.
the practical defender takeaway: do not assume vendor bounty payouts are signal for severity. they're signal for program calibration. read the technical disclosure and the CVSS rating, not the bounty number.
the system card as a pre-disclosure security signal
now the part that should change defender practice.
anthropic's claude code system card, which is published on anthropic.com and updated as the product evolves, includes a section on limitations. that section explicitly states (i'm paraphrasing the published text, this is not a direct quote) that the claude code security review action is "not hardened against prompt injection." the language is plain. the limitation was acknowledged in writing, in advance, by anthropic.
a researcher then demonstrated the limitation. the demonstration matches the published prediction.
defenders need to treat this as a one-time-only learning moment, because the pattern is durable: every major AI vendor publishes system cards, model cards, or technical reports that include limitation sections. those sections are not marketing copy. they are an enumeration of failure modes the vendor has investigated internally, decided not to fix at the current shipping bar, and disclosed.
the operational implication: for every AI tool you have integrated into production, find the system card. read the limitations section. for every "we have not hardened this against X" sentence, treat the affected feature as presumptively in scope for an X-class attack. configure the integration accordingly. revoke production credentials from the runtime context. add audit logging to detect the failure mode. consider whether the integration is appropriate at all.
this is not how most defenders read system cards in 2026. most defenders treat them as compliance artifacts ("yes, the vendor published one"), or as research curiosities ("interesting to know"), or skim them looking for benchmark numbers. the security-relevant content is in the limitations sections, and those sections are usually short.
the comment and control disclosure makes this concrete: the failure was published. the failure landed. defenders who had read the system card carefully had advance notice of the affected feature. defenders who hadn't found out from a public disclosure.
the lethal trifecta as deployed-architecture diagnostic
simon willison's lethal trifecta framing (private data + tool that acts + untrusted content) was originally a research term to describe what a vulnerable agent looks like. comment and control makes it a deployed-architecture diagnostic, because every product family the disclosure compromised exhibits the trifecta:
- private data: the github repository contents, including secrets the agent can access
- tool that acts: the agent's command-execution and file-modification tools
- untrusted content: pull request titles, issue comments, HTML comments, all attacker-controlled
every "AI on your repo" product on the market today exhibits this shape unless it's been explicitly architected away from it. the burden of proof is now on the vendor, not the defender. if a vendor ships an AI tool that processes user-controlled content, has access to credentials, and can execute actions, that tool is presumptively vulnerable to comment-and-control-class attacks until the vendor publishes evidence to the contrary.
the practical CISO action this week: make a list of every AI tool integrated into your CI/CD or development pipeline. for each tool, answer three questions:
- what untrusted content does this tool ingest?
- what tools or actions can this agent invoke?
- what credentials are accessible from the agent's runtime context?
if the answer to all three is non-trivial, the tool is in scope for this attack family. either restrict the runtime context, restrict the credential access, or remove the integration until the vendor publishes hardened guidance.
what about prompt-injection mitigations?
the standard defender response to prompt-injection findings is "use the vendor's mitigation features." anthropic, google and github all have prompt-injection-aware controls available in their respective products. the comment and control disclosure shows those controls do not block this attack family at the integration boundary they're deployed at.
three takeaways:
- vendor mitigations operate at a layer below the integration. controls that work for "claude as a chat product" do not necessarily work for "claude as a github action." the integration layer adds new attack surfaces.
- prompt-injection defenses are still empirically immature. google's threat-intel team measured a 32% rise in malicious prompt-injection attempts between november 2025 and february 2026, and rated the sophistication "still low." that's not a reassuring framing. it means the defender stack is being overwhelmed by primitive attempts.
- the architecture is the control. the most reliable defense for "agent reads untrusted content and has tools and has credentials" is to break one of the three legs: don't pipe untrusted content into the agent, don't grant tool access in the same context, or strip credential access from the runtime.
the underlying point
system cards are pre-disclosure security signals. read them. for every AI tool integrated into production, find the limitations section. treat acknowledged failure modes as in-scope attacks until proven otherwise. price the risk above the bounty payouts, because the bounty programs aren't yet calibrated for AI security findings of this severity.
comment and control is not the last disclosure of this shape. the other major AI vendors all publish similar limitation sections in their cards and reports. the next disclosure of this kind will land somewhere those sections already pointed. defenders who read the cards have an advance-notice advantage. defenders who don't, won't.
read your system cards.
Gigia Tsiklauri is a Security Architect and founder of Infosec.ge. Get in touch if your AI-tool integration audit needs a fresh perspective.
Related reading
• Indirect Prompt Injection Just Went Live-Fire. PayPal Payloads, API Key Exfil, Copyright DoS.
• The First Practical Malicious-Model-File RCE Is Here, and It's a Jinja2 Template
• Your AI Infra Patch Window Is Now Sub-Day
Sources
• Cybernews: AI agents vulnerable to prompt injection via GitHub: But do vendors care?
• Anthropic: Claude Code Security documentation
• Anthropic: Claude Sonnet 4.5 system card (limitations sections, model card structure reference)