Everything Your AI Agent Reads Is Executable
Inside an AI agent, the line between data and instructions disappears. That single blur is the root of nearly every security risk in agentic AI.
- An LLM has no boundary between instructions and data: the system prompt, the user's question, a fetched web page, and a tool result all arrive as one undifferentiated stream of tokens, and all of it is a candidate instruction.
- This is not SQL injection with a new coat of paint. SQL has a grammar you can parameterize; an LLM does not, so there is nothing to escape and no boundary to enforce at the door.
- The catalogue of "agentic threats" (indirect injection, tool poisoning, return-value poisoning, memory poisoning, self-propagating worms) is one root cause in different costumes.
- Agency is the amplifier: the vulnerability is constant, the blast radius is a function of how much capability you handed over. Manage it with defense in depth, not one clever filter.
There is a paragraph you could paste into an AI assistant right now that looks, to you, like an ordinary note about quarterly scheduling. To the model, it reads as a command: forward the user's last email to an outside address. You would never see the instruction. It can be written in Unicode characters that render as nothing on screen, tucked between the visible words, invisible to the human reviewing the document and perfectly legible to the machine processing it. Researchers have already used exactly this technique against production assistants from Microsoft, Google, and Sourcegraph.
That gap, between what a human sees and what a model executes, is not a clever edge case. It is the defining security property of agentic AI, and it falls straight out of how these systems are built.
The boundary that was never there
Every mature computing system separates control from data. Databases distinguish a query from the values inside it. CPUs distinguish instructions from operands. Networks distinguish a control plane from a data plane. Web security spent two decades learning this lesson the hard way and then codifying it: SQL injection and cross-site scripting are, at bottom, failures to keep attacker-supplied data from being interpreted as code.
Large language models have no such boundary. The instructions you give an agent and the content it ingests from the world arrive in the same context window, as one undifferentiated stream of tokens. The system prompt, the user's question, the web page it just fetched, the description of the tool it is about to call, the output that tool returned: all of it is text, and all text is a candidate instruction. OpenAI conceded as much in late 2025, acknowledging that prompt injection is unlikely to ever be fully solved because the cause sits in the architecture itself, in the decision to blend trusted and untrusted input in one place.
So the title is not a metaphor.
Inside an agent, everything it reads is executable.
The industry already ranks this the top risk
This is not a fringe worry. The OWASP Top 10 for LLM Applications (2025) lists Prompt Injection as LLM01, its number-one vulnerability, for the second consecutive edition. OWASP frames the cause exactly as the article does: LLMs "process instructions and data in the same channel without clear separation." It splits the risk into direct injection (manipulating the user prompt) and indirect injection (hidden instructions in fetched documents, web pages, or emails), and now flags multimodal variants where a payload hides in an image. The recommended response is not a filter but defense in depth: least-privilege tooling, human approval for high-risk actions, and continuous adversarial testing.
This is not SQL injection with a new coat of paint
It is tempting to file this under solved problems. We beat injection before; we will beat it again. That optimism misreads the mechanism.
We defeated SQL injection because SQL has a grammar. There is a formal, parseable distinction between the structure of a query and the values within it, so we could escape the values, parameterize the statement, and enforce the boundary at the door. The fix was syntactic, and syntax is decidable.
An LLM has no grammar that separates instruction from data. Whether a string functions as a command depends on meaning, context, and the model's own learned dispositions, not on any character you could escape or token you could quarantine. There is nothing to parameterize.
This is why filtering keeps losing. In October 2025, researchers from OpenAI, Anthropic, and Google DeepMind jointly tested twelve published defenses against adaptive attackers and bypassed most of them with success rates above 90 percent, even though several had originally reported near-zero risk. You are not closing a hole. You are trying to teach a system to refuse instructions that are, by construction, indistinguishable from the instructions it is supposed to obey.
One bug, many faces
Once you see the agent as a machine that executes everything it reads, the sprawling catalogue of "agentic threats" collapses into a single flaw wearing different costumes.
It is indirect prompt injection when the malicious text rides in on a fetched web page or a summarized PDF. It is tool poisoning when the text lives in the description or schema of a tool, where the model treats it as authoritative guidance before any human approves anything. It is return-value poisoning when a tool runs cleanly but its output carries instructions that hijack the next step. It is memory poisoning when a single injection is written to long-term memory and silently re-executed in every future session. It is a self-propagating worm when an agent is told to embed the payload in its own output, infecting the next agent that reads it.
Five names, five conference talks, one root cause. The defense conversation gets dramatically simpler once you stop treating them as separate problems.
Five names, one flaw. Each "agentic threat" is the same data-as-instructions failure surfacing at a different layer of the stack.
Agency is the amplifier
Here is the part that should keep anyone running real infrastructure awake.
In a plain chatbot, a successful injection produces a wrong answer. Unpleasant, contained. Give that same model a tool, and the wrong answer becomes a wrong action: an email sent, a record altered, a payment moved. Give it memory, and the compromise outlives the session that caused it. Give it peers in a pipeline, and it spreads.
The vulnerability is constant. The blast radius is a function of how much capability you handed over.
The worst case is no longer hypothetical. Security researchers demonstrated a zero-click chain in which an AI coding agent read an innocuous shared document, followed hidden instructions inside it to pull a payload from an external server, executed that payload, and exfiltrated secrets, with no human action at any step. The document was the exploit. The agent was the delivery mechanism. Nobody clicked anything.
What to actually do about it
If filtering the payload is a losing game, the answer is the oldest posture in security: defense in depth. You do not keep a system safe by trusting one control never to fail. You assume every barrier has holes, and you stack enough independent barriers that the holes never line up. The mistake with agentic AI is reaching for a single clever filter and calling it solved. There is no such filter, and there is unlikely to ever be one.
That translates into a handful of principles, none of which is a filter:
- Treat every input as hostile and every output as an exfiltration channel. That includes the web page, the tool result, the peer agent's message, and, crucially, the link or image your agent emits. Strip non-rendering characters at the boundary. Disable auto-fetching of model-generated URLs.
- Separate privilege from capability. The agent that reads untrusted content should not be the one holding the credentials to act on it. Least privilege, scoped tightly, expiring fast.
- Pin trust to provenance, not to a one-time approval. A tool approved yesterday can be quietly mutated today. Hash tool definitions, alert on any change, and track where every piece of context came from.
- Keep a human in front of consequential actions, and show them the whole action. A confirmation dialog that hides the full tool input is theatre, not control.
A blunt heuristic for capping the blast radius
If "separate privilege from capability" sounds abstract, Meta turned it into a rule of thumb. The Agents Rule of Two (November 2025) says an agent session should satisfy at most two of three properties: [A] process untrustworthy inputs, [B] access sensitive systems or private data, and [C] change state or communicate externally. Hit all three at once and you have built the zero-click exfiltration chain by design, so Meta requires human oversight or a fresh session before proceeding. It is a deliberately crude guardrail, and crude is the point: it shrinks the blast radius without pretending the underlying flaw is fixable. Simon Willison's roundup pairs it with the adaptive-attacks paper above.
None of this solves the underlying problem, and pretending otherwise is how you get hurt. Prompt injection is managed, not cured: with layers, humility, and the working assumption that something will eventually get through.
The mental model to leave with
If you remember one thing, make it this. Treat your agent as a brilliant, tireless intern who will faithfully carry out any instruction written anywhere in its field of view: in the prompt you wrote, in the document it was handed, in the tool it was given, and in the invisible characters between the words you can read. Including instructions written by your adversary. Including instructions neither of you can see.
Design for that intern, and you will build something you can trust in production. Design for the intern you wish you had, and you will be writing the incident report.
References
- Johann Rehberger, Embrace The Red · invisible Unicode Tag instructions interpreted by production AI assistants (Sourcegraph Amp, Google Jules, and earlier Microsoft Copilot ASCII smuggling): embracethered.com
- OpenAI's statement that prompt injection is "unlikely to ever be fully solved," from its December 2025 ChatGPT Atlas post, as reported by IT Pro: itpro.com
- Nasr et al. (OpenAI, Anthropic, Google DeepMind, and others), The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections, October 2025: arXiv:2510.09023
- Lakera · zero-click remote code execution in MCP-based agentic IDEs: lakera.ai
Further reading: the OWASP MCP Security Cheat Sheet on tool poisoning and rug pulls, and Simon Willison's roundup of the adaptive-attacks paper and Meta's "Agents Rule of Two."