Agentic AI 2024-2025 produced four distinct classes of evidence the 2026 procurement reader should not collapse into a single 'AI is working' narrative: (1) vendor-published wins inside vendor-controlled environments (ServiceNow internal 90% L1 deflection, framed by Nenshad Bardoliwalla as upper bound conditioned on two decades of structured workflow data the customer does not have), (2) audited customer pilots with active human oversight (BT 35% case-resolution improvement with random checks per Hena Jalil; UK Government Digital Service 26 minutes/day saved across 20,000 staff in Q4 2024; HMRC 28,000-staff M365 Copilot rollout April 2026), (3) public walk-backs (Klarna May 2025 Bloomberg-reported reversal of the 700-agent claim while the original press release stayed live; GitHub Copilot April 2026 token-counting bug; Salesforce Agentforce IT 200-customer reality vs Marc Benioff's launch pitch), and (4) structural failure modes (CRMArena-Pro 35% multi-step agent reliability finding; Carnegie Mellon independent verification at 30-35%; EchoLeak CVE-2025-32711 cross-agent prompt-injection class). Each class produces a different procurement lesson; treating them as one narrative is the most common 2026 enterprise mistake.
URL-equity restoration of /the-agentic-ai-revolution-real-world-success-stories-and-strategic-insights-from-2024-2025/ — previously retired (the original WordPress-era piece used fabricated Sarah-and-her-AI-agents-style protagonist case studies), but Bing Webmaster AI Performance data 2026-04-21 → 2026-05-02 showed 22 citations on this URL across 12 days (fifth-highest cited URL). The retraction broke the AI-citation chain for the 'enterprise AI ROI case study' query family. New editorial-standard piece at the original slug preserves the URL while replacing fabricated case studies with named primary-source customer evidence (BT, UK GDS, HMRC, ServiceNow internal, Klarna, Datadog 10-Q risk disclosure). Slug warnings (slug-too-long 90 chars, slug-contains-date) are accepted as the intentional AI-citation preservation trade-off per Peter's Option A decision 2026-05-04. Sister claims: AM-029 (Not holding since 10 Jun 2026 — its Stanford 12/88 figure failed primary-source verification), AM-053 (McKinsey 17%), AM-121 (AI in IT operations reality check), AM-045 (EchoLeak class). Cadence 60-day. Trigger conditions: additional named-customer audit publications; further documented walk-backs from major 2024-2025 case studies; new structural failure-mode evidence from research arms (Salesforce AI Research, CMU, MITRE ATLAS).
- AM-002 · Not holding
/holding/AM-130/Embed this claimiframe + oEmbed
The card auto-updates when the claim's status, last-reviewed date, or correction log changes. Embedders never need to refresh — the card is rendered live from the canonical record.
Email-me when AM-130's status, next review date, or correction log changes. One email per change. No newsletter subscription, no other mail.
The claim: Agentic AI 2024-2025 produced four distinct classes of evidence the 2026 procurement reader should not collapse into a single 'AI is working' narrative: (1) vendor-published wins inside vendor-controlled environments (ServiceNow internal 90% L1 deflection, framed by Nenshad Bardoliwalla as upper bound conditioned on two decades of structured workflow data the customer does not have), (2) audited customer pilots with active human oversight (BT 35% case-resolution improvement with random checks per Hena Jalil; UK Government Digital Service 26 minutes/day saved across 20,000 staff in Q4 2024; HMRC 28,000-staff M365 Copilot rollout April 2026), (3) public walk-backs (Klarna May 2025 Bloomberg-reported reversal of the 700-agent claim while the original press release stayed live; GitHub Copilot April 2026 token-counting bug; Salesforce Agentforce IT 200-customer reality vs Marc Benioff's launch pitch), and (4) structural failure modes (CRMArena-Pro 35% multi-step agent reliability finding; Carnegie Mellon independent verification at 30-35%; EchoLeak CVE-2025-32711 cross-agent prompt-injection class). Each class produces a different procurement lesson; treating them as one narrative is the most common 2026 enterprise mistake.
About this register
The Reporting register tracks claims published from articles addressed to senior enterprise IT leaders — CIOs, IT directors, heads of platform. Claims are reviewed on a 30–90 day cadence; each review either reaffirms the claim, marks one substantive part as Partial, or marks it Not holding once the underlying evidence has been overtaken.
Recent corrections in Reporting
- AM-008 · Partial · 17 Jun 2026
Source-text figure re-review: Google's 2024 Environmental Report reports a 28% year-over-year increase to 8.1 billion gallons, not the 33% (from a 6.1 billion 2023 base) asserted at publish. The 8.1B 2024 figure and the Microsoft WUE 0.30 L/kWh / 39%-improvement figure are unchanged and verified. Article corrected to 28% and the unsupported 6.1B base removed; the claim text retains the original figure with this correction per the Holding-up protocol.
- AM-132 · Partial · 10 Jun 2026
One of four legs unanchored on re-review. The claim text attributes '12% of deployments clearing 300%+ ROI with 88% at or below break-even at 12-18 months' to the Stanford DEL 2026 Enterprise AI Playbook. Full-text verification on 10 Jun 2026 found no such figure in that source: the playbook (Pereira, Graylin, Brynjolfsson, Apr 2026) studies 51 successful deployments by design and contains no ROI distribution, no 300%-plus cohort, and no break-even measurement point (full finding at AM-029, correction of 10 Jun 2026). The only verified figure carrying the same 12/88 numerals is IDC research with Lenovo (via CIO.com, Mar 2025): roughly 88% of AI proof-of-concepts never reach production and roughly 12% graduate — a pilot-to-production graduation metric, not an ROI distribution. The Gartner 28%, McKinsey 23%/17%, and MIT NANDA 95% legs verify; they support a small high-performing tail and a large struggling body, but none documents the two-peak bimodal shape the claim asserts. Status Up -> Partial.
- AM-129 · Partial · 10 Jun 2026
One of three read-against anchors unanchored on re-review. The claim text cites 'Stanford Digital Economy Lab Enterprise AI Playbook (12/88 bimodal ROI distribution at 12-18 months)' and frames the realistic ROI band around 'the highest-discipline 12% cohort'. Full-text verification on 10 Jun 2026 found the playbook contains no 12/88 distribution, no bimodal ROI shape, and no 12-18-month ROI measurement point (full finding at AM-029, correction of 10 Jun 2026). The claim's core negative finding — no mid-market enterprise has produced a documented +240% ROI in 90 days under audited conditions — is unaffected; the McKinsey State of AI 2025 and MIT NANDA legs verify and continue to support it. The '12% cohort' framing has no verifiable referent. The only verified figure carrying the 12/88 numerals is IDC's pilot-graduation finding (roughly 88% of AI proof-of-concepts never reach production; via CIO.com, Mar 2025), a different metric. Status Up -> Partial.
Reviews coming up in Reporting
- AM-063 · Holding · next +9d (27 Jun 2026)
AI agents executing financial transactions need a four-control bundle (action-approval gates by blast radius, kill-swit…
- AM-061 · Holding · next +9d (27 Jun 2026)
Production agentic-AI costs at scale routinely run multiples of POC projections, and a layered optimisation programme c…
- AM-003 · Partial · next +9d (27 Jun 2026)
GPT-5 Pro's tiered-subscription model forces enterprises to classify problems by computational difficulty — $200/month…