Glossary · Industry term

Red teaming

Also known as: AI red team, adversarial testing, AI red-teaming

Adversarial testing of an AI system by a team that simulates attackers, edge-case users, or malicious prompts to surface failure modes, jailbreaks, or policy violations the system did not catch in standard QA. AI red-teaming covers single-model attacks (jailbreak prompts, prompt injection, harmful content elicitation) and agent-mode attacks (tool misuse, action-class boundary tests, cross-agent prompt injection).

How this publication uses it

AI red-teaming is now part of the procurement bar in 2026 enterprise deployments, not a nice-to-have. The shift mirrors the cybersecurity industry's red-team adoption from the early 2010s. The practical primitive: a 4-hour tabletop exercise per quarter where the security team treats the agent as the target and rehearses attacks against its action surface. Most enterprise red-teams in 2026 are calibrated against human attackers, not agent-cadence attackers — that gap is the most-cited 2026 SOC remediation.

Related frameworks

Articles that analyse this term

Primary sources

Anthropic. Responsible Scaling Policy + capability evaluations
OWASP. Top 10 for LLM Applications 2025