lmdeploy got popped in 12 hours. your ai infra patch window is now sub-day.
if you've been treating ai-inference platforms like a normal piece of internal software, april has been a brutal month for that mental model. four critical RCE-class bugs in two weeks across the open-source llm serving stack. all of them weaponized inside the day. one of them went from github advisory to live honeypot exploit in 12 hours and 31 minutes with no public proof-of-concept.
let's walk through what just happened with lmdeploy, why it matters, and what to actually do about it.
the bug
CVE-2026-33626 is a server-side request forgery in lmdeploy, the open-source toolkit from openmmlab for compressing, deploying and serving large language models. specifically, it's in the load-image function in lmdeploy/vl/utils.py, which is called when a vision-language model receives a request with an image-url field.
the function fetches whatever URL it gets. no validation of scheme, no validation of IP range, no hostname check. you point it at http://169.254.169.254/latest/meta-data/, you get back AWS instance-metadata. you point it at http://localhost:6379/, you get back redis. you point it at an out-of-band DNS endpoint, you exfil data. classic SSRF, just hosted in a place where SSRF wasn't on most security teams' threat models.
CVSS 7.5. affects all versions before 0.12.3. patch is to upgrade.
the timeline
this is where it gets interesting. the GHSA-Q4QJ-HJ7M-7JGX was published on github on april 21. sysdig's threat research team had a honeypot running a vulnerable lmdeploy instance ready to go. at 03:35 UTC on april 22, the honeypot caught a live exploit attempt.
12 hours and 31 minutes between advisory publication and live attack. zero public PoC code on github. no exploit-DB writeup. no metasploit module. nothing in the public exploit ecosystem at all.
source IP was 103.116.72.119 in kowloon bay, hong kong. in a single eight-minute session, the operator ran ten requests across three phases:
- phase one: AWS instance-metadata service probe (169.254.169.254)
- phase two: internal port-scan against redis (localhost:6379) and mysql (localhost:3306), plus a secondary HTTP admin interface enumeration
- phase three: out-of-band DNS exfiltration to an attacker-controlled domain to confirm egress
what's telling is that the exploit rotated between vision-language models during the session. internlm-xcomposer2 in some requests, OpenGVLab/InternVL2-8B in others. that suggests the operator wasn't running an lmdeploy-specific tool. they had a generic vision-language SSRF primitive that knew how to feed image-url payloads to any VLM endpoint, and lmdeploy was just the platform of the day.
the pattern
lmdeploy is not the standalone story. lmdeploy is the fourth critical AI-inference platform bug in two weeks:
- april 11: marimo CVE-2026-39987, pre-auth OS command injection in the python notebook server's WebSocket terminal endpoint. CISA added to KEV on april 23 with a may 14 fceb deadline. sysdig caught a live exploit 9 hours and 41 minutes after disclosure.
- april 20: sglang CVE-2026-5760, jinja2 SSTI in the chat_template field of GGUF model files. CVSS 9.8. weaponizes hugging face as a malware distribution channel.
- april 22: cohere AI terrarium sandbox flaw (CERT/CC VU#414811), enables root code execution and container escape from inside the sandboxed code-execution environment.
- april 22: lmdeploy CVE-2026-33626, this one.
four critical bugs. four different vendors. four different attack classes. weaponized inside the day in every case where we have telemetry.
this is not a coincidence. operators are watching the AI-infra advisory feed in real time and shipping. there's a generic vision-language SSRF tool. there's a generic notebook RCE tool. there's a generic GGUF template injection tool. the AI-inference platform attack surface has matured into a real operator specialty in 2026.
why your patch window is broken for this
most enterprise patch SLAs for "critical" vulnerabilities are 30 days. some particularly mature shops run 7-day SLAs. nobody's running an hours-based SLA on a software category that's been stable enough to have a normal patch cadence.
AI-inference platforms broke that assumption. for three reasons:
first, the public advisory is the trigger. operators are not waiting for press coverage or PoCs. they're parsing GHSA feeds, security-advisories.gitlab.com, vendor security pages. by the time bleepingcomputer has a writeup, the exploitation window has been open for hours.
second, the platforms run with rich credential context. an lmdeploy or marimo or sglang instance running internally has cloud credentials, LLM provider API keys, model registry tokens, and frequently database creds for the metadata stores those models pull from. SSRF on one of these isn't just SSRF, it's "rotate every credential this host has ever seen."
third, the deployment pattern is internet-facing more often than the security teams that built it realize. data science teams stand up these inference endpoints because they need shared access for evaluation, demos, internal tooling. "behind the VPN" frequently means "behind a VPN that allows traffic from anywhere on the corp network," which means "addressable by anything that's already gotten any other foothold."
put it together and you have: a fast-evolving software category, with rich credential context, deployed in surprisingly exposed positions, with a patch window that closes in single-digit hours.
what to actually do
the temptation is to write a thousand-word strategic memo. don't. here's the operational shortlist.
immediate (today):
- inventory every lmdeploy, marimo, sglang, cohere terrarium and equivalent open-source inference endpoint in your environment. include data-science stacks, ml-platform team installs, internal tooling, and anywhere a researcher set up an endpoint for "just a quick experiment" that never came down.
- check your version pins. lmdeploy needs 0.12.3+. marimo needs 0.23.0+. sglang needs the patched version per their advisory. apply same-day.
- audit network egress from any of these hosts in the last 30 days. specifically look for traffic to AWS instance metadata service (169.254.169.254), large numbers of outbound DNS lookups to unfamiliar domains, and any traffic to internal redis/mysql/admin endpoints from the inference host that doesn't match documented architecture.
this week:
- rotate every credential that has ever lived on an exposed inference host. cloud keys, LLM provider API keys, registry tokens, DB creds, anything in the env. assume disclosure, not maybe-disclosure.
- get an AI-infrastructure category into your vulnerability management system as a separate severity class. critical CVEs on AI-inference platforms need an hours-based SLA, not a days-based one. document this in your patch policy.
- pin every model download to a hash. not a slug, not a version tag, a hash. anything that resolves a reference at runtime against an upstream is part of your attack surface.
this month:
- treat hugging face, github model repositories, and equivalent public AI infrastructure as part of your supply chain in your vendor risk framework. this isn't just docker hub anymore. the JFrog writeup on js-logger-pack (rogue npm package using hugging face as a malware CDN) and the marimo NKAbuse story (typosquat hugging face space hosting droppers) make this concrete.
- add LLM-API egress as a discrete category in your network monitoring framework. mandiant's M-Trends 2026 attribution of PROMPTFLUX and PROMPTSTEAL to live operations means LLM API traffic is now an attacker C2 channel, not just a productivity tool.
- review your data science team's deployment patterns. anyone running an internal inference endpoint without (a) a defined patch SLA, (b) network egress controls, (c) credential rotation discipline, and (d) clear documentation of what credentials are on the host should not be running an internal inference endpoint.
the underlying point
twelve hours and thirty-one minutes is the number to remember. that's how long it took an operator who had never publicly published a PoC to weaponize a github security advisory and go live against exposed targets.
the AI-infrastructure category does not have the same temporal margin as the rest of your software estate. operators have already specialized. tooling has been prepositioned. the advisory itself is the starting gun.
if you're treating an internal lmdeploy install the same way you treat your kubernetes cluster, you're probably mispricing the risk by one or two orders of magnitude. patch faster. egress less. rotate credentials more often.
it's april 25. the next CVE in this category is probably already written.
Gigia Tsiklauri is a Security Architect and founder of Infosec.ge. Get in touch if you're sizing the AI-infrastructure attack surface in your environment and want a second pair of eyes.
Related reading
• Marimo on KEV and the AI Supply Chain Has Arrived
• The First Practical Malicious-Model-File RCE Is Here, and It's a Jinja2 Template
• Your AI Stack Has a Supply Chain Problem - and TeamPCP Just Proved It