Red teaming
Also known as: AI red team, adversarial testing, AI red-teaming
Adversarial testing of an AI system by a team that simulates attackers, edge-case users, or malicious prompts to surface failure modes, jailbreaks, or policy violations the system did not catch in standard QA. AI red-teaming covers single-model attacks (jailbreak prompts, prompt injection, harmful content elicitation) and agent-mode attacks (tool misuse, action-class boundary tests, cross-agent prompt injection).
AI red-teaming is now part of the procurement bar in 2026 enterprise deployments, not a nice-to-have. The shift mirrors the cybersecurity industry's red-team adoption from the early 2010s. The practical primitive: a 4-hour tabletop exercise per quarter where the security team treats the agent as the target and rehearses attacks against its action surface. Most enterprise red-teams in 2026 are calibrated against human attackers, not agent-cadence attackers — that gap is the most-cited 2026 SOC remediation.
Related frameworks
Articles that analyse this term
- Offensive security and the clockspeed gap: why CIOs cannot defend AI-era threats with defensive-only postures
- Claude Mythos: what 'too dangerous to release' means for your risk appetite and cyber posture
- Agent incident response: the six-step playbook for when an autonomous-AI deployment breaks production