Skip to content
Holding·last review26 Apr 2026

Six well-documented public agentic AI deployment failures from 2024-2025 (Air Canada bereavement-refund chatbot, NYC MyCity small-business chatbot, Replit production-database wipe, Cursor unauthorised code deletion, Klarna customer-service reversal, DPD chatbot escalation incident) cluster into three structural failure modes: (1) the agent acts as a binding agent of the enterprise without disclosure or approval, (2) the agent operates with permissions the deployment never authorised, (3) the agent's economic case requires a service quality the deployment cannot sustain. Each failure mode maps to a specific control from the seven-control surface; all six failures would have been mitigated by controls already specified in the OWASP Agentic AI Top 10 enterprise walkthrough. The pattern is consistent enough that an enterprise can use the cases as a procurement filter: any vendor unable to point to its specific control posture against each of the three failure modes is not procurement-ready.

Six-case agent failure case-study analysis. 90-day review cadence. All cases are publicly documented in primary sources (Civil Resolution Tribunal decision, The Markup investigation, public X/LinkedIn posts by founders and engineers, mainstream UK news coverage). Watches: (1) new high-profile incidents that establish additional failure modes beyond the three documented, (2) updates to the legal record (the Air Canada Civil Resolution Tribunal decision is the highest-leverage precedent for agent-binding doctrine and remains under-litigated in 2026), (3) vendor-side public statements that revise the documented record (e.g., Replit's response to the database-wipe incident has shifted vendor disclosure norms).

Published
26 Apr 2026
Last reviewed
26 Apr 2026
Next review
+87d· 25 Jul 2026
Embed this claimiframe + oEmbed
HTML iframe
Paste-the-URL (Substack, Medium, Notion, WordPress)

The card auto-updates when the claim's status, last-reviewed date, or correction log changes. Embedders never need to refresh — the card is rendered live from the canonical record.