We only publish what we can defend in a vendor meeting. Every claim carries an ID, a review date, and a verdict you can check.
- Our ledger168
- Holding156
- Partial06
- Not holding06
- Industry claims tracked26
- Last reviewtoday
Quiet — no verdict transitions in the last 30 days. See the ledger →
Agent Mode AI — claim-tracked agentic AI analysis
Single-agent or multi-agent: what the 2026 deployment record actually says
The 2025–2026 deployment record shows single-agent architectures win on accuracy, cost, and MTTD below roughly 12 tool-domains. Multi-agent only pays back above that threshold, and only when inter-agent state is bounded by a shared structured artifact.
27 years enterprise IT operations. Global organisation. Major incidents. Editorially independent.
- 100pieces
- 168tracked claims
- 14public retractions
The Enterprise Agentic Governance Benchmark. Six dimensions, scored 0–100. Free 5-minute web diagnostic; 30–45 minute Excel for governance groups.
Recently reviewed
Three claims most recently re-tested against their primary sources. Status changes log to the corrections page; nothing quietly vanishes.
- AM-133HoldingQ3 2026 Claim Review Bulletin: which claims moved, which held, and what the EU AI Act enforcement window did to the corpusReviewed 30 Jul 2026Read article →
- AM-CANON-001HoldingThe accountability architecture for AI-written publicationsReviewed 14 May 2026Read article →
- OPS-066HoldingWhen AI doesn't pencil out: break-even seat math for 5-, 15-, and 40-person firmsReviewed 12 May 2026Read article →
Why this publication has a ledger
Most AI commentary gets paid for being loud about what's new. Almost none gets measured on whether what it said last quarter still holds this one. That is the gap this publication exists to close. Every published argument carries an ID, a review date, and one of three verdicts — Holding, Partial, or Not holding — that updates over time as evidence accumulates. The verdict log is the product.
When a claim stops holding, the page says so. The original sentence stays visible. The correction is dated and appended. Nothing is quietly removed. You do not need to trust the author to trust the verdicts — the receipts are public, on a 30–90 day review rhythm, and the corrections record is permanent.
Editor's picks
One per topic cluster- Governance90 days to EU AI Act enforcement: what the corpus says enterprises still haven't done
- Cost economicsThe hidden costs of agentic AI: a CFO's guide to true TCO and ROI modeling
- SecurityClaude Mythos: what 'too dangerous to release' means for your risk appetite and cyber posture
- ArchitectureNon-human identity for AI agents: the 2026 IAM playbook
- StrategyWhy 88% of agentic AI deployments fail
Latest pieces
Full archive →The agent fan-out problem: when one prompt becomes 400 LLM calls
Production agentic systems amplify a single user request into dozens or hundreds of internal LLM calls. Most enterprise unit-economics, latency budgets, and observability setups are still priced for 1:1.
The split verdict: GPT-5.5 vs Claude Opus 4.7 and why CIOs need two models, not one
Anthropic shipped Claude Opus 4.7 on 16 Apr 2026; OpenAI shipped GPT-5.5 seven days later. Both vendors claim leadership. Neither model wins everything. The procurement question for 2026 is not which one to standardise on, because the evaluation evidence does not support a single-model answer for any enterprise running both agentic-coding workloads and knowledge-work workloads. The two-year procurement decision is whether to plan the routing or accept the tax of pretending it does not exist.
Agentic code auditing: what the Firefox Claude Mythos disclosure tells procurement about CI-time defaults
Mozilla's Firefox 150 release (November 2025) shipped fixes for 271 vulnerabilities surfaced by the Claude Mythos Preview pipeline. The headline fact ('AI found 271 bugs') is true but is not the procurement-relevant one. The procurement-relevant change is that the agentic-verification step (the agent builds and runs its own test cases to triage suspected bugs before reporting) cleared the false-positive wall that blocked earlier read-only GPT-4 / Claude Sonnet 3.5 attempts from production CI. CI-time agentic auditing becomes the default expectation for any shipping enterprise software in 2026, with three derived procurement-deck questions and one dual-use risk surfacing alongside the defensive disclosure.
Agentic AI accuracy claims: the three questions every CIO should ask before 'ready-to-run' becomes a procurement decision
Anthropic posted a launch this week positioning the product as 'ready-to-run'. The phrase is procurement-deck noise unless three questions are answered: accuracy rate on which task, against which baseline, measured by what methodology. The 2026 industry baseline for procurement-credible accuracy disclosure is the academic-benchmark pattern (CRMArena-Pro 35% multi-step reliability on a defined CRM task corpus; CMU TheAgentCompany 30-35% reproduction range; WebArena ~36% browser-agent ceiling) and the vendor-disclosure pattern Anthropic itself established earlier (Claude for Chrome 23.6% → 11.2% → 0% with named attack corpus and patch cadence). Vendor 'ready-to-run' positioning that doesn't meet either bar leaves the deploying enterprise inheriting the methodology gap as an audit-defense burden.
What is Agent Mode? Microsoft, Cursor, GitHub Copilot, and OpenAI in 2026
Agent Mode is the same brand-name shipping in three different product classes in 2026: Microsoft 365 Copilot productivity-suite agents, Cursor IDE agents, and GitHub Copilot code-platform agents. Procurement teams comparing them feature-by-feature are comparing categories that aren't substitutes.
IBM Watson Health and the change-management variable: what the canonical failure tells procurement
IBM Watson Health launched in 2015 with a $5 billion-plus investment trajectory and was sold to Francisco Partners in 2022 at roughly a fifth of that. The technology was substantively functional; the organisational integration was not. RAND Corporation's 2024 study (n=65 senior data scientists) puts the AI-project failure rate at approximately 80%, dominated by organisational rather than technical causes. The procurement-deck implication is operational: the change-management variable belongs in the discovery phase upstream and in the procurement decision itself, not as a post-deployment afterthought when the named-owner question surfaces at audit.
The CIO's playbook: what the named-success agentic AI deployments actually share
Four named enterprise deployments (JPMorgan, Toshiba, Wipro, Aberdeen City Council) cleared the McKinsey scaling threshold; the documented cohort that did not, RAND's 2024 study of 65 senior data scientists, identified an 80% pilot-to-production failure rate. The five operational characteristics shared by the named-success cases are observational, citable, and distinct from the proprietary acronym frameworks that crowd the procurement deck. CIO-level visibility on per-deployment ROI is the one most often missing in the failed cohort.
The 56% AI-skill wage premium: what the Atlanta Fed data measures, and who actually captures it
The Federal Reserve Bank of Atlanta's May 2025 'By Degrees' analysis (Lightcast job-posting data through 2024) reports a 56% wage premium for AI-skilled workers and AI-skill demand surfacing in 1.62% of all job postings. The headline number is real; the typical mid-career worker reading it should not expect to capture it from a generic AI-literacy course. Boston Consulting Group's October 2024 study (n=11,000+ employees, 50+ countries) reports a 14% frontline vs 44% leader gap in AI upskilling access. That gap, not the 56% itself, is the operational variable for who captures the premium and who sees credential inflation without the wage signal.
Browse by topic pillar
Five strategic pillarsComing next
Peter's editorial calendar — honest dates, bumped-with-notes if missed.- Week 1726 Apr 2026Non-human identity — the first procurement question CIOs aren't asking yet
Every enterprise agent deployment passes through a credential. Most teams still hand the agent a human's credential. Naming the NHI gap is the next Q2 procurement conversation.
- Week 1803 May 2026Shadow agent sprawl — what telemetry catches and what it misses
The browser-as-agent-runtime pattern creates a detection gap that MDM/CASB don't see. What the first wave of shadow-AI discovery tools actually find, and the three categories they miss.
- Week 1910 May 2026The AI agent MSA — four clauses every enterprise contract needs by August
EU AI Act enforcement activates 2 Aug 2026. The clauses that survive legal review in the next quarter will be the ones that don't pretend the agent is conventional SaaS.