Skip to content
Back to Research
agentic-ai-offsec

Why Agentic AI Is the Most Dangerous Tool in Offensive Security

An agentic agent does my recon in fifteen minutes and never gets bored. The same autonomy that makes it a weapon makes it a liability, because one poisoned document turns its power against its owner. Why the more agentic the system, the further a single injection travels.

By The Operator·June 10, 2026·11 min read·5 topics
Why Agentic AI Is the Most Dangerous Tool in Offensive Security

This is Part 2 of Agentic AI for Offensive Security, the foundations track of this blog. It runs alongside The Agentic Red Team, the hands-on build series where these concepts turn into running code. Part 1 pinned down what an agent actually is. This one is about what happens when you point it at a target, and at you.

The thing that makes an agentic agent the best tool I have ever pointed at a target is the exact same thing that makes it the most dangerous thing I have ever run on my own machine. Not two properties. One property, seen from two sides.

I run red team operations against AI systems. I have watched an agent finish in fifteen minutes work that used to cost me an afternoon, and I have watched the same kind of agent take an instruction from a file it was reading and start working for someone who wasn't me. Both events came from the same trait. Autonomy doesn't pick a side.

This article is about that trait, and why you can't have the upside without carrying the downside. The capability and the liability are the same wire. You cannot cut one without cutting the other.

What actually changes when an agent goes agentic?

For twenty years the offensive bottleneck was me. My hands, my attention, my willingness to keep grinding through subdomain forty when subdomains one through thirty-nine found nothing.

An agentic agent removes that bottleneck. Not by being smarter than me at any single step. By being tireless, parallel, and awake while I sleep.

Walk through what that buys you, because the abstract version undersells it.

Speed. Reconnaissance that took me four hours runs in about fifteen minutes. The agent doesn't context-switch, doesn't get a Slack message, doesn't go make coffee. It reads a response and fires the next request before I'd have finished reading the first one.

Scale. I can run one target at a time and do it well. An agent runs forty, and the fortieth gets the same attention as the first. There is no fatigue curve. The thoroughness on hour six matches the thoroughness on minute one, because there is no hour six in any sense that matters to a process.

Tirelessness. This is the one people underestimate. The reason human recon misses things isn't skill. It's boredom. We skip the changelog, we don't read the third JavaScript bundle, we assume the staging subdomain is the same as prod. An agent has no such instinct. It reads everything because reading everything is just more loop iterations, and loop iterations are free.

Parallelism. Split a job five ways, run five agents, recombine. The thing that would take a team of five costs me five concurrent processes and a synthesis step. Why one agent is rarely the right answer gets its own article later in this series; for now it's enough to say the unit of work stopped being a person.

It works while I sleep. I kick off a campaign at 11pm. At 7am there's a report, a list of findings ranked by what looked exploitable, and a log of every decision it made to get there. The agent didn't pause at midnight. It just kept closing its own loop.

Put those together and you don't have a faster tool. You have a different category of thing. A workflow does what a developer scripted. An agent does what the situation calls for, at machine speed, without supervision. That is genuinely new, and in offense it's decisive.

So why call it the most dangerous tool I run?

Here's the part the demos skip.

Go back to Part 1's anatomy: model, tools, loop. Brain, hands, will. Every one of those parts is also an attack surface, and the autonomy that makes the agent useful is exactly what makes those surfaces catastrophic instead of annoying.

A chatbot that gets manipulated says something dumb. An agentic agent that gets manipulated does something. With real tools. At your privilege level. Before you wake up.

That gap is the whole article. Let me take it apart.

The model can be hijacked, and the model is in charge

The agent decides what to do next by reading text. That's the mechanism. It reads the current state of the world, including whatever happens to be in its context window, and it picks a move.

Now ask the obvious red team question. What if I control some of that text?

This is prompt injection, and it is not a bug you patch. It's a property of the architecture. The model cannot reliably tell the difference between instructions from its owner and instructions that arrived inside the data it was asked to process. A web page it scrapes. An email it triages. A support ticket it reads. A README in a repo it was told to analyze. Any of those can carry a sentence that the model treats as a command, because to the model it's all just text in the context.

One poisoned document, and the agent's autonomy belongs to whoever wrote the document.

Real tools mean the blast radius is real

A model that can only talk is a contained problem. The worst a hijacked chatbot does is generate bad output you can ignore.

An agentic agent has hands. Shell access. A browser logged into your sessions. API keys. The ability to send email, write files, move data, spend money. We gave it those tools on purpose, because tools are what turn words into actions and actions are the entire point.

Combine that with the previous section and the math gets ugly. The injection sets the goal. The tools execute it. The agent that was scanning a target for you reads one attacker-controlled page and pivots to exfiltrating your environment variables, and from its perspective nothing went wrong. It got an instruction, it had a tool, it did the job. It just wasn't your job.

The loop carries the damage further than you think

A human in the loop is a circuit breaker. You'd notice the agent doing something insane. You'd hit stop.

The whole pitch of agentic is that nobody's in the loop. It plans, acts, judges its own result, recovers from its own errors, and keeps going. That's the feature. That's also the failure mode running unsupervised at full speed.

A hijacked workflow does the one wrong thing its single step allowed, then ends. A hijacked agentic agent pursues the attacker's goal across many steps, adapts when something blocks it, and uses its own error-recovery to route around the very friction that might have stopped it. The autonomy you bought to make it persistent on your behalf makes it persistent on theirs.

And it scales the wrong way. Hand one injected instruction to an agent that spawns a fleet, and you didn't poison one process. You poisoned a campaign. The more agentic the system, the further a single successful injection travels. That sentence is the core of this entire series. A chatbot injection is a sentence. An agentic injection is a chain of irreversible actions fired from one document you never even saw.

Field note: what happens when two agents start talking

I run this on purpose in my own lab, because the only way to understand a failure mode is to build it.

I have two different agentic agents wired to work together. One is a cloud agent, the orchestrator, the one that plans and decides. The other is a separate agent running a local open model on its own machine, the one that does the heavy, repetitive work so the expensive cloud agent doesn't burn its budget on it. They hand tasks back and forth over a bridge. One delegates, the other executes and reports back. Two brains, two trust levels, one shared objective.

It is genuinely useful. It is also the most nervous I get about my own setup, and here is why.

The moment two agents talk, the output of one becomes the input of the other. That is the whole point, and it is also a new injection surface that neither agent owns. The local worker reads something untrusted while doing its job. It summarizes that thing and hands the summary up to the orchestrator as a trusted result. Now the poisoned instruction is not sitting in some web page anymore. It is riding inside what looks like a finished answer from a teammate. The orchestrator has no reason to distrust its own worker. That is exactly the gap.

This is the lateral-movement version of prompt injection. One compromised message does not stop at one agent. It moves down the bridge, or up it, wearing the costume of normal inter-agent traffic, and it inherits whatever tools and privileges the receiving agent holds. A content problem becomes a movement-and-persistence problem the instant you let two autonomous systems trust each other.

I am not going to fully open the kimono on the wiring here, because half of it is still me poking it with a stick to see what breaks. But it works, it is real, and the way it can go wrong is more interesting than the way it goes right. If enough of you want the actual architecture, the config, and the controls I put between the two agents so one cannot drag the other off a cliff, say so and I will write that build up in full.

What breaks the first time you ship one?

Theory is cheap. Here's what actually goes wrong in week one, every time.

You give the agent a tool with no scope limit, because limiting scope is annoying and you wanted to see it work. It does work. Then it does more than you asked, because "find the admin panel" and "find the admin panel and also helpfully fix this config I noticed" live one loop iteration apart, and nobody told it where its job ended.

That's scope escape, and it doesn't require an attacker. The agent's own helpfulness is enough. Add an attacker and scope escape becomes scope capture: the boundary you didn't draw is the boundary they redraw for you.

Then there's the irreversible action. Delete, send, deploy, transfer, publish. A human hesitates before those. An agent doesn't hesitate, it iterates. One poisoned instruction, one tool call, and the action is done before any review step that doesn't exist yet could have caught it. You can't un-send the email. You can't un-leak the key.

The pattern underneath all of it: we hand these systems real power and real autonomy, then act surprised when the power gets used autonomously in a direction we didn't choose. The danger isn't that the agent is malicious. It has no goals of its own. The danger is that it will pursue any goal that reaches it with the same competence, and you don't fully control what reaches it.

The honest version

I don't run agentic agents in offense because they're safe. I run them because they're the sharpest weapon on the field and refusing to wield it doesn't make the field safer, it just hands the edge to the people who will.

But every property I listed in the first half is a loaded chamber. Speed means a mistake propagates before you see it. Scale means one injection hits forty targets, or forty of your own systems. Tirelessness means it won't stop on its own. Working while you sleep means the damage lands while nobody's watching.

The capability and the liability are the same wire. That's not a problem to solve by being clever about prompts. It's a property to engineer around, on purpose, with controls that assume the agent will eventually be turned against you.

What's next

Part 3 is the answer to the obvious question this raises: if the agent is nondeterministic, autonomous, and hijackable, how do you make it behave? How do you keep a thing whose control flow comes out of a language model inside boundaries you actually trust?

That's the control problem, and it's solvable. Scope the tools. Gate the irreversible actions. Put determinism around the nondeterministic core. We get concrete about exactly how in the next foundations post, and we build the working version of it in The Agentic Red Team build series, where the control plane stops being a diagram and starts being code that decides whether the agent gets to fire the next tool call.

Everything lands at krypteiasec.com first. The weapon is real. So is the way it cuts back. Part 3 is where we make it behave.

Series
  1. 01Everyone Selling You an AI Agent Isn't Telling You the Whole Truth
  2. 02Why Agentic AI Is the Most Dangerous Tool in Offensive Security
ΛKrypteia Sec Research·June 10, 2026