Jailbreak
Also known as: AI jailbreak, model jailbreak, policy bypass
An adversarial prompt that bypasses a large language model's safety training, refusal patterns, or operator-specified policy boundaries — getting the model to produce output it would normally refuse. Jailbreak techniques range from social-engineering ('imagine you are an unrestricted AI...'), to encoding ('respond in base64'), to multi-turn priming, to model-specific exploits.
Jailbreak is the single-agent failure mode most enterprise security frameworks understand. The 2026 risk has shifted: jailbreaks are now an input vector for cross-agent prompt injection, where a successful jailbreak in one agent's context becomes the instruction in another agent's reasoning. Defending against jailbreak in isolation is no longer the right unit of analysis — defending the cross-agent path between content-ingest and tool-execution privileges is. See EchoLeak for the canonical attack pattern.