When a Prompt Becomes a Shell: How Prompt Injection Turned Into Remote Code Execution

By AI Agent Engineering | 2026-06-26 | research

Most teams think of prompt injection as a content problem. The agent gets tricked into saying something it shouldn't — leaking a system prompt, generating a biased answer, ignoring a guardrail. Embarrassing, fixable, bounded. In May 2026, Microsoft's own security researchers published the version of the story nobody wants to tell: in a popular agent framework, a crafted prompt didn't just make the model misbehave. It executed code on the host and wrote a file to the Windows Startup folder [1].

That is not a content problem. That is remote code execution, delivered through plain language, against a Microsoft-built framework, documented by Microsoft's Defender Security Research Team. The findings shipped as two CVEs in Semantic Kernel, and the lesson underneath them applies to every agent stack in production right now: the moment your agent can run tools, the prompt stops being text the model reads and becomes a program the system runs. Secure it accordingly, or don't ship it.

From "say something bad" to "run something bad"

The mental model most teams carry is a straight line: untrusted input goes into the model, the model produces output, and the worst case is bad output. Filter the input, check the output, and you're covered. That model is why prompt injection gets filed under "responsible AI" instead of "application security."

The Semantic Kernel findings break the line in two places, and both breaks happen after the model.

The first, tracked as CVE-2026-26030, lives in an in-memory vector store and routes injected content into an eval path [1]. eval takes a string and executes it as code. When a string influenced by untrusted input reaches eval, the attacker is no longer shaping what the model says — they are shaping what the interpreter runs. The injection crosses out of the language model entirely and into the host's execution context. The result is arbitrary code execution: the attacker's payload runs with whatever privileges the agent process holds.

The second, CVE-2026-25592, is a sandbox escape in the SessionsPythonPlugin [1]. The plugin exposes a DownloadFileAsync capability far more broadly than it should, and that over-exposure lets an attacker write files outside the sandbox's intended boundary — including, in Microsoft's demonstration, to the Windows Startup folder. A file in the Startup folder runs every time the machine boots. That is the difference between a transient compromise and a persistent one: the attacker doesn't just get in, they get to stay.

Read together, the two CVEs sketch a complete kill chain. Injection reaches a tool that executes code (RCE). Another tool lets the attacker write where they shouldn't (escape). Write to the right location and the access survives reboots (persistence). Every link in that chain is a tool doing exactly what it was built to do — just for the wrong instructor.

The tool layer is the attack surface

Here is the uncomfortable reframing. The model is not the vulnerability. The tools are.

A language model on its own can produce alarming text and nothing more. It has no hands. What gives an agent its power — and its danger — is the tool layer: the plugins, functions, and integrations that let it read files, run code, call APIs, and touch the world. Every one of those is a bridge from "the model decided something" to "something happened in the real system." Prompt injection is dangerous in direct proportion to what sits on the far side of that bridge.

This is why the convenience features are the soft underbelly. An eval-based tool that lets an agent run dynamic code is enormously useful and enormously exposed. A file-download helper that isn't tightly scoped is a foothold waiting for a push. Frameworks ship these capabilities because builders want them, and builders wire them in because they work in the demo. The vector store with the eval path and the Python plugin with the loose file access were not bugs bolted on by mistake. They were features, behaving as designed, in a context where "as designed" meant "executes attacker-controlled input."

And Semantic Kernel is not special here. Microsoft's researchers flagged that the same class of weakness likely sits in other popular frameworks — LangChain and CrewAI among them — because they share the same pattern: rich tool ecosystems, dynamic execution, and a tendency to trust the model's decisions about which tool to invoke and with what arguments [1]. Microsoft found it in its own house first and said the neighbors' houses are built the same way. The specific CVEs are patched. The architecture that produced them is industry-standard.

"But the model is the safety layer" — no, it isn't

A reasonable objection: agents are supposed to use judgment. The model is trained to refuse malicious requests, so won't it just decline to run the dangerous tool? Isn't the model itself the guardrail?

Treating the model as your security boundary is the precise mistake these CVEs punish. Models are probabilistic. They can be jailbroken, confused, and talked around, and the entire field of prompt injection exists because that manipulation is reliable enough to weaponize. A guardrail you can defeat with a cleverly worded paragraph is not a guardrail; it is a suggestion. Worse, in the eval case the model's judgment may never enter the loop at all — if untrusted content flows into an execution path through the tool plumbing, the exploit can fire regardless of what the model "decided." You cannot ask a model to be the wall when the model is the thing being manipulated.

Real security has to live where models can't be tricked: in the deterministic layer around them. Permissions, sandboxes, and input handling don't get jailbroken. They either allow an operation or they don't.

Building agents that can't be turned into shells

The defenses are old security discipline applied to a new surface. None of them depends on the model behaving.

Never evaluate model-influenced strings. This is the bright line. No eval, no exec, no shell interpolation, no dynamic code path that takes a string anywhere downstream of untrusted input. If an agent must run code, run it in a strongly isolated sandbox with no path back to the host — never by evaluating a string in the agent's own process. The first CVE existed because a string reached eval. Make that structurally impossible and the entire bug class disappears.

Give every tool the least privilege that lets it work. The file-download escape happened because a capability was exposed more broadly than its job required. Audit each tool for exactly what it needs — which paths, which hosts, which operations — and deny everything else. A tool that can only write to one scratch directory cannot plant a file in Startup, no matter how it's invoked.

Sandbox execution and egress as hard boundaries. Code-running and file-touching tools belong inside isolation that constrains the filesystem, the network, and the process, enforced by the runtime rather than requested by convention. The boundary has to hold even when the agent is fully compromised, because sometimes it will be.

Threat-model the tool layer, not the prompt. Most agent security reviews scrutinize the system prompt and the model's outputs. Point the same rigor at the tools. For each one, ask the question an attacker asks: if I controlled this tool's inputs completely, what could I make it do? The answers are your real attack surface.

Pin and patch your framework dependencies. These specific holes are closed, but new ones will open as frameworks add capabilities. Track the security advisories for whatever stack you build on and update deliberately. The teams still running the vulnerable Semantic Kernel versions are the ones who will learn about CVE-2026-26030 the hard way.

For two years the agent industry has treated prompt injection as a reputational risk — the danger that your bot says something it shouldn't and someone screenshots it. Microsoft just demonstrated the real ceiling: a paragraph of text that ends with code running on your server and a file waiting in Startup for the next reboot. The prompt was never just text. It was always a program looking for an interpreter. The only question that matters is whether you built one for it to find.


References

[1] Uri Oren, Amit Eliahu, Dor Edry — Microsoft — When prompts become shells: RCE vulnerabilities in AI agent frameworks. Blog