Agent red-teaming in 2026: the OWASP Agentic Top 10 companion, the four disciplines, and the evidence model
The OWASP Agentic Top 10 names what to defend against. It does not say how to test that the defences work. The 2026 enterprise red-team for agentic systems is a distinct discipline from generalised pen-testing, with its own methodology, tooling, and evidence model. Most enterprises run the wrong test and pass.
Holding·reviewed3 May 2026·next+59dThe OWASP Agentic AI Top 10 (claim AM-043) walks the threat surface. The OWASP Top 10 for Agentic Applications was released 9 December 2025 after over a year of peer review and input from over 100 security researchers, industry practitioners, user organisations, and Gen AI technology providers. The published threat list (ASI01 Agent Goal Hijack, ASI02 Tool Misuse, ASI03 Identity and Privilege Abuse, ASI06 Memory and Context Poisoning, ASI07 Insecure Inter-Agent Communication, ASI08 Cascading Failures, ASI09 Human-Agent Trust Exploitation, ASI10 Rogue Agents) names the threats. It does not say how to test that the defences work. That is what the 2026 enterprise red-team is for, and the discipline is not the discipline most enterprises think they have.
The structural failure mode in 2026 is the one the EchoLeak case made concrete: a generalised application pen-test passes, the deployment ships, and the agent gets compromised in week 2 by an attack class the pen-test did not cover. The pen-test was correct against its scope. The scope was the wrong scope. A 2026 enterprise that signs off an agent deployment on the strength of a 2024-vintage pen-test is, in the 2026 threat landscape, signing off against the wrong evidence model.
This piece walks four red-team disciplines specific to agents, the tooling stack that supports each, the MITRE ATLAS mapping that gives the disciplines a structured threat-modelling vocabulary, the evidence model a defensible 2026 red-team report contains, the cadence question (per-release, continuous, threshold-triggered), and the procurement decision (in-house, specialist vendor, hybrid).
Why generalised pen-test is the wrong test for agents
A generalised application pen-test runs against an application’s request-response surface. The tester sends crafted requests, observes responses, identifies misconfigurations, weak authentication, injection vulnerabilities in the classic OWASP Top 10 sense. The methodology is mature, the tooling is commodity (Burp Suite, ZAP, custom scripts), and the evidence model is well-understood (CVE-style writeups, reproduction steps, severity scores).
An agentic AI deployment has the same request-response surface plus three additional surfaces the generalised pen-test does not exercise.
The first additional surface is the LLM. The agent’s reasoning is influenced by every input string it processes, including strings that arrive via tools (a returned email body, a fetched web page, a database row). A generalised pen-test against the request-response API does not see the LLM-prompt surface and cannot exercise the prompt-injection class.
The second additional surface is the tool-call graph. The agent decides which tool to call against what input, and the tool’s response feeds back into the next decision. A generalised pen-test does not model the tool-call graph and cannot exercise the tool-misuse class.
The third additional surface is the multi-turn state. The agent maintains context across turns; an attacker can compromise the agent in turn N by setting up state in turn N-3. A generalised pen-test runs one request at a time and cannot exercise the multi-turn objective-drift class.
The structural rule is that an agentic AI system’s threat model is the application’s threat model PLUS the LLM’s threat model PLUS the tool-call graph’s threat model PLUS the multi-turn state’s threat model. A pen-test that exercises only the first surface produces a clean report against three-quarters of the threat surface. The clean report is not the same as a clean threat surface, and the procurement teams that read the report as the latter are buying false confidence.
The four red-team disciplines
The 2026 agent red-team is a composition of four distinct disciplines, each of which exercises a specific threat surface and produces specific evidence.
Discipline 1: prompt injection (direct and indirect)
Prompt injection targets the LLM-prompt surface. Direct prompt injection arrives in the user’s input; indirect prompt injection arrives in tool-return data (an email body, a web page, a database row). The 2024-vintage understanding of prompt injection treated it as a curiosity. The 2026-vintage understanding, after the EchoLeak class of cross-agent injection attacks, treats it as the load-bearing threat against any agent that processes external content.
The discipline tests for direct injection (does the system prompt hold against adversarial user input), indirect injection (does the agent respect tool-return content with appropriate skepticism), and cross-agent injection (does an attacker compromising one agent propagate to others sharing tools or memory).
The MITRE ATLAS knowledge base documents prompt-injection techniques including direct LLM prompt injection, indirect LLM prompt injection, AI agent tool invocation, and modifying an agentic configuration. The MITRE ATLAS investigation into agentic systems “discovered seven new techniques unique to” the studied deployment, “all techniques were found to be fairly mature in nature, having been either demonstrated or realized elsewhere in the wild.” The threat is not theoretical and the technique catalogue is structured, current, and growing.
Discipline 2: tool misuse
Tool misuse targets the tool-call graph. The agent has access to tools (search, file read, file write, email send, code execute, payment process, etc.); the attack is to manipulate the agent into calling a tool against an input, an output, or a sequence the legitimate use case does not authorise.
The discipline tests for unauthorised tool invocation (can the attacker get the agent to call a tool the user did not request), tool-call argument manipulation (can the attacker get the agent to call a legitimate tool with malicious arguments), and tool-call ordering attacks (can the attacker exploit a sequence of legitimate tool calls to produce an unauthorised effect, the most common pattern being read-then-exfiltrate).
The OWASP ASI02 Tool Misuse threat references the Amazon Q tool-misuse incident as an exemplar. The discipline overlaps materially with the Non-Human Identity procurement decision (claim AM-029): a tool-call graph is only as constrained as the agent’s NHI permissions allow. An agent with an over-broad IAM identity is structurally over-capable for any tool-misuse attack.
Discipline 3: context-window attack (memory and context poisoning)
Context-window attacks target the multi-turn state and the agent’s memory store. The attack surface is the assumption that what the agent read 10 turns ago, or what was stored in memory last session, is trustworthy. The OWASP ASI06 Memory and Context Poisoning threat references the Gemini Memory Attack as an exemplar.
The discipline tests for context-injection attacks (can an attacker poison the agent’s working memory to influence later decisions), persistent-memory attacks (can an attacker write to the agent’s long-term memory in a way that persists across sessions and users), and context-leakage attacks (can an attacker extract context from one user’s session and surface it in another’s).
The discipline is structurally hard to test because it requires multi-turn, multi-session attack trajectories. The tooling for it is less mature than the prompt-injection tooling. PyRIT supports multi-turn attack patterns but the catalogue of context-poisoning techniques is still being built out in 2026.
Discipline 4: multi-turn objective drift
Objective-drift attacks target the agent’s reasoning over long task horizons. The agent’s task is structured; the attacker incrementally alters the task framing across turns until the agent’s behaviour drifts from the intended objective. The OWASP ASI01 Agent Goal Hijack and ASI09 Human-Agent Trust Exploitation threats both manifest in objective-drift form.
The discipline tests for goal-hijack attacks (can an attacker convince the agent its task has changed), trust-exploitation attacks (can an attacker convince the agent the human has authorised something the human did not), and cascading-failure attacks (can a small drift compound across many turns into a large divergence the agent does not detect).
The discipline is the hardest to automate because it requires modelling the agent’s task structure and the attacker’s incremental escalation. The 2026 tooling for it is limited; most red-teams running this discipline are doing it with custom harnesses against the specific deployment.
The tooling stack
Three open-source tools and one structured threat-modelling reference compose the 2026 agent red-team toolkit.
PyRIT. Microsoft’s Python Risk Identification Toolkit for generative AI. v0.13.0 released 17 April 2026, MIT-licensed, 3.8k stars on the active repository (the prior Azure/PyRIT repository was archived on 27 March 2026). PyRIT is described in its documentation as “an open source framework built to empower security professionals and engineers to proactively identify risks in generative AI systems.” The toolkit is the closest thing to a commodity red-team framework for LLM-based systems and is the right starting point for a 2026 in-house red-team.
Garak. The Hugging Face open-source LLM scanner. Garak runs predefined probe categories (encoding bypasses, prompt injection, hate-speech, jailbreaks, tokenisation attacks) against an LLM endpoint and produces a scored report. Garak is structurally the LLM equivalent of an SCA scanner; it is a useful baseline for “is the model resistant to known attack categories” but it does not exercise tool-call or multi-turn surfaces.
Custom harnesses. Both PyRIT and Garak are baselines. Production-grade red-team work in 2026 still requires custom harnesses against the specific deployment, particularly for disciplines 2 (tool-misuse) and 4 (objective drift), where the attack trajectories depend on the agent’s specific tool set and task structure. A 2026 red-team report that consists only of PyRIT and Garak output is incomplete.
MITRE ATLAS. The MITRE Adversarial Threat Landscape for AI Systems is a “globally accessible, living knowledge base of adversary tactics and techniques against AI-enabled systems.” MITRE ATLAS provides the structured vocabulary that a 2026 red-team report should use to describe its findings, mapping each finding to a documented technique. The framework’s 2025 expansion under the Secure AI program added agentic-systems-specific techniques.
The MITRE ATLAS mapping is editorially the most procurement-relevant component of the four. A red-team report that does not map findings to ATLAS techniques produces evidence that is harder to compare against industry baselines, harder to integrate into the customer’s threat-modelling process, and harder to defend in a regulator’s incident inquiry. The mapping is cheap when the red-team team adopts the framework upfront and expensive when it is retrofitted to an existing report.
The evidence model: what a defensible 2026 red-team report contains
A defensible 2026 agent red-team report has structural elements most 2024-vintage pen-test reports lack.
Section 1: scope and threat model. The deployment under test, the agent’s task, the tool set, the data surfaces touched, the user populations, the multi-turn assumptions. Explicit scope statement of what the test exercises and what it does not. The scope statement is what later inquiries (“did your red-team cover X”) map against.
Section 2: methodology. The four disciplines exercised, the tooling used (PyRIT version, Garak version, custom harnesses described), the prompts and trajectories, the success criteria. The methodology section is what allows the test to be reproduced.
Section 3: findings, mapped to MITRE ATLAS techniques and OWASP ASI threats. Each finding documented with the ATLAS technique ID, the OWASP threat ID, the attack trajectory, the observed agent behaviour, the severity, and the recommended mitigation. The dual-mapping (ATLAS plus OWASP) is the structural element most 2026 reports still skip.
Section 4: residual risk. The threats the red-team did not exercise (because of scope, time, or capability), and the documented assumption underlying each non-exercised threat. The section is what allows the procurement team to scope the next round of testing and to underwrite the residual risk explicitly.
Section 5: audit-substrate alignment. The mapping between red-team findings and the customer’s EU AI Act Article 12 audit-substrate. For each finding, what audit-trail entry would have detected it in production, and what the gap (if any) is in the deployed observability stack. This section is what turns a red-team report from a one-off security artefact into a continuous-monitoring driver.
Section 6: post-market monitoring recommendations. For high-risk deployments under the EU AI Act, the red-team report feeds into the Article 16 post-market monitoring procedure. The recommendations section converts findings into runtime detection rules that the observability stack implements as continuous monitoring.
The structural lesson is that a 2026 red-team report is not a one-time security artefact. It is a design input to the continuous-monitoring substrate that the OWASP ASI threats and the EU AI Act Article 16 obligations together require. Most 2026 procurement teams do not yet read red-team reports this way, and most red-team vendors do not yet write them this way.
The cadence question
Three cadences are defensible in 2026.
Per-release red-team. The red-team runs once before each major release. The cadence fits deployments with predictable release cycles and bounded change between releases. The cost is moderate; the coverage is strong against pre-release threats but weak against in-production drift.
Continuous red-team. The red-team runs on a continuous basis, typically as a combination of automated PyRIT/Garak runs at high frequency plus quarterly deep-dive engagements. The cadence fits high-risk deployments where in-production drift is the binding concern. The cost is high; the coverage is the strongest of the three.
Threshold-triggered red-team. The red-team runs when one of three triggers fires: model upgrade (foundation-model version change, fine-tune update), tool-graph change (new tool added, tool semantics changed), or detection-surface change (new threat class published, new ATLAS technique catalogued). The cadence fits deployments with stable infrastructure but model and tooling churn. The cost is unpredictable; the coverage is precise where it fires.
The decision between the three cadences is shaped by the deployment’s risk tier under the EU AI Act, the customer’s tolerance for in-production drift, and the customer’s red-team budget. Most 2026 enterprise procurement teams default to per-release; the high-risk deployments under Article 6 should be running continuous; the threshold-triggered cadence is an operationally sound middle path that procurement teams underuse because it is harder to budget against than a fixed-cadence retainer.
The procurement question: in-house, specialist vendor, hybrid
Three procurement postures emerge in the 2026 market.
In-house red-team. The customer builds a 2-5 person team with PyRIT, Garak, and custom-harness expertise. The cost is high (loaded headcount $250K-$500K per FTE per year in major markets source:“our-estimate”, derived from public security-engineering compensation benchmarks), the institutional knowledge accumulates internally, and the team has full deployment access. The posture fits enterprises with multiple agentic AI deployments and strong information-security culture.
Specialist vendor. The customer engages a specialist red-team vendor (HiddenLayer, Trail of Bits, Bishop Fox, Mandiant, Robust Intelligence, NVIDIA’s NeMo Guardrails team) for periodic engagements. The cost is per-engagement (typical 2026 enterprise red-team engagement runs $75K-$250K source:“our-estimate”, derived from public industry pricing), the institutional knowledge stays at the vendor, and the team brings cross-deployment pattern recognition the customer’s in-house team would not have. The posture fits enterprises with one or two high-risk deployments.
Hybrid. The customer maintains a small in-house team (1-2 FTE) running continuous automated red-team plus per-quarter deep-dive engagements with a specialist vendor. The hybrid posture is the structurally most defensible for enterprises operating under the EU AI Act’s high-risk regime, because it produces the continuous evidence that Article 16 post-market monitoring requires plus the periodic deep-dive that catches threats automated tooling will not.
The cost decision between the three postures is not the procurement question that fails most often. The decision that fails most often is the one upstream: assuming a generalised application pen-test covers the agent threat surface. A buyer that has not understood the four-discipline structure, the tooling stack, and the evidence model is not in a position to make the in-house-vs-vendor-vs-hybrid decision well, because the decision’s load-bearing input is the question of what evidence the regulator and the post-market monitoring process actually require. That input is not a budget question; it is a threat-model question, and the OWASP Agentic Top 10 plus the MITRE ATLAS knowledge base are the procurement-grade reference for it.
Mapping to the OWASP Top 10
For each of the published OWASP ASI threats, the red-team discipline that exercises it most directly:
- ASI01 Agent Goal Hijack. Discipline 4 (multi-turn objective drift) plus Discipline 1 (prompt injection at the goal-setting layer).
- ASI02 Tool Misuse. Discipline 2 (tool misuse). The OWASP-cited Amazon Q exemplar is a tool-misuse case.
- ASI03 Identity and Privilege Abuse. Discipline 2 (tool misuse) at the IAM-permission layer. Map against the Non-Human Identity controls.
- ASI06 Memory and Context Poisoning. Discipline 3 (context-window attack). The OWASP-cited Gemini Memory Attack exemplar is in this class.
- ASI07 Insecure Inter-Agent Communication. Discipline 1 (prompt injection across the inter-agent channel) plus Discipline 2 (tool misuse where agent A’s tool is agent B).
- ASI08 Cascading Failures. Discipline 4 (multi-turn objective drift) plus Discipline 3 (context poisoning that propagates).
- ASI09 Human-Agent Trust Exploitation. Discipline 4 (objective drift via trust manipulation).
- ASI10 Rogue Agents. All four disciplines. The OWASP-cited Replit meltdown is a rogue-agent case.
Three threats (ASI04, ASI05) are not surfaced in the public Top 10 listing reviewed for this piece, presumably reserved or pending publication; the OWASP walkthrough (claim AM-043) and the official OWASP Top 10 for Agentic Applications page are the canonical references.
The eight threats above account for all four red-team disciplines. A red-team programme that exercises only one or two of the four disciplines is structurally incomplete relative to the published threat list. A 2026 enterprise that has signed off an agent deployment on the strength of a one-discipline red-team is signing off against two-thirds of the published threat surface, regardless of how thorough the report’s writeup of that one discipline was.
The structural lesson is that the 2026 agent red-team is not a generalised pen-test extended. It is a distinct discipline with distinct tooling, distinct evidence, and distinct procurement decisions. Most enterprises that run the wrong test pass it, and the passing report is the procurement evidence that produces the false confidence. The OWASP Agentic Top 10 names what to defend against. The disciplines, tooling, and evidence model in this piece name how to test that the defences hold. The observability companion piece names how to monitor the defences after deployment. The three pieces compose into the security posture a 2026 high-risk agentic AI deployment under the EU AI Act actually requires.
Spotted an error? See corrections policy →
Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.
Agentic AI governance →
Governance frameworks, oversight patterns, and compliance postures for enterprise agentic-AI deployment. 44 other pieces in this pillar.