Skip to content
Glossary · Industry term

Guardrail

Also known as: AI guardrail, agent guardrail, safety filter

A policy-enforcement primitive that constrains an AI agent's input or output at runtime — content filters, denied-topic lists, PII redaction, prompt-injection detection, output validation, jailbreak detection. Guardrails sit between the agent's foundation model and either the user (input guardrails) or downstream systems (output guardrails) and trigger refusal, redaction, or escalation when policy is violated.

How this publication uses it

Guardrails are necessary but not sufficient. Vendor-shipped guardrails (Bedrock Guardrails, Azure Content Safety, OpenAI Moderation, Salesforce Trust Layer) cover the obvious failure modes — explicit content, PII leakage, jailbreak prompts. They do not cover the deployment-specific failure modes — domain-specific output validation, business-rule violations, integration-boundary checks. The right posture is treating vendor guardrails as the floor, not the ceiling, and adding deployment-layer policy enforcement on top for everything the vendor's filter isn't designed to catch.

Related frameworks

Articles that analyse this term

Primary sources

Vigil · 78 reviewed