Vigil·last review 21h ago·next review cycle 19 May 2026
Every claim this publication has made, and whether it still holds.
The point of writing about enterprise AI is to be right for longer than a news cycle. This page tracks every argument this publication has made, reviewed on a 30–90 day rhythm. If something stops holding, it's marked and the piece is annotated. Nothing is quietly removed. Claims made by others — vendors, analysts, regulators — are tracked separately at /archive/.
57holding
03partial
00not holding
Segment
Status
Status
Claim
Next review
Holding
AM-020 · pub 31 Jul 2025 · rev 19 Apr 2026
Based on 2026 CFO-guide data: €368K vs €158K naive estimate, 40-60% TCO underestimate, 73% exceed by 2.4x, 15-20%/year maintenance, supervision tax in thousands/month, 70% failure from change management. Watching for a Big 4 TCO framework or enterprise CFO survey that resolves the cross-departmental framing.
Path B operator case-study piece, stance-heavy because the named-case corpus is thin. Defensible 3-chair salon stack: one booking-and-payments platform with deposit enforcement (Booksy or Square Appointments) + consumer-tier Claude Pro or ChatGPT Plus for marketing copy + Canva (free or Pro) for visual content. Honest acknowledgement that practitioner-published material on Instagram/TikTok is the larger informal corpus that this piece does not citation-mine.
Path B operator case-study piece, real-case-heavy because Pearl and Overjet publish substantial small-practice rosters with named outcomes. Defensible 2-dentist family practice stack: one FDA-cleared radiography AI (Pearl Second Opinion or Overjet Vision AI) + one revenue-cycle AI feature + Enterprise-tier general AI for non-PHI work. Consumer-tier AI explicitly out of scope for clinical workflows. Vendor-published outcomes (Promenade 20 hrs/wk, Quest +19% Crown production, Midtown 566% case acceptance) treated as evidence-of-existence not as benchmarks.
Path B operator case-study piece, vendor-corpus-heavy because small construction firms rarely publish. Defensible 25-employee specialty contractor stack: one PM platform (Procore, Fieldwire, or Buildertrend) + AI-assisted estimating add-on + reality-capture if multi-site or warranty exposure + Enterprise AI assistant for project documents. Vendor-published 30-50% estimating time saving treated as directional, not measured.
Path B operator case-study piece, real-case-heavy because legal AI publishes more case density than other professional-services categories. Defensible 4-person firm stack: Spellbook for contracts + vLex Vincent or Westlaw/LexisNexis with AI for research + Enterprise-tier general assistant for everything outside those workflows. Consumer-tier AI explicitly out of scope for privileged work.
Path B operator case-study piece. Corpus is platform-led because individual small-firm cases are rarely published. Defensible 5-person firm stack: one books platform (Xero OS or QuickBooks+Intuit Assist) + one practice management with AI capture (Canopy) + one MCP-style integration layer (Digits MCP). Cost in low-hundreds per seat per month range. Labour saving editorially framed as 6-12 hrs/week per bookkeeper on the recurring grind.
Editorial framework piece. Each question maps to a specific public artefact (Trust Center, DPA, sub-processor list, security/incident page, termination clause) such that absence of the artefact is itself the answer. Not a substitute for ISO 27001 or SOC 2; not a guarantee. Pairs with OPS-011 (use-case filter) — vendor selection happens after the use case clears OPS-011's filter.
Editorial framework piece. The four questions are derived from the most common failure modes observed across SMB-scale agent deployments through 2025-2026. Designed for a 30-minute meeting with three people; output is a one-page record. Not a substitute for vendor due diligence, risk register, or AI policy.
Cheap-tier API cost comparison. Claude Haiku 4.5 ($1/$5 per MTok) is roughly 6-10x the per-token cost of Gemini Flash-Lite ($0.10/$0.40) or GPT-4o-mini ($0.15/$0.60) — the multiplier is real, the absolute number at SMB volume is not. Workload-shape recommendations: GPT-4o-mini for high-frequency triage, Claude Haiku for long-document review, Gemini Flash for research/synthesis.
Solo-founder buying guide. Claude Pro $17/mo annual or $20/mo monthly; ChatGPT Plus $20/mo. Two-yes filter resolves most cases; running both seats remains a defensible split for genuinely mixed workflows.
First tooling-comparison sibling to OPS-001. Like-for-like comparison is Notion Business at $19.50 vs ClickUp Business + Brain at $21. Research Mode is the only AI feature in the comparison that flips the verdict.
First operator-section piece. Cash math at the ~5,000-ops/mo profile favours Make.com Pro on cash-and-time combined; n8n self-hosted wins on cash alone plus data-residency and agentic extensibility. Reviewed against vendor pricing pages on 26 Apr 2026.
The publication's charter argument — asserted as testable, not aspirational. Review checks: ledger movement, correction-log activity, citation density vs comparable pieces from analyst firms / vendor blogs / hidden-AI publications.
AI agent risk register template. 60-day review cadence. Watches: (1) European AI Office Article 9 enforcement guidance (expected Q3 2026) that may codify specific register column requirements, (2) ISO/IEC 42001 implementation guidance that may map onto the register format, (3) major case studies in 2026 enforcement actions that establish precedent for what constitutes an adequate register, (4) tooling vendor releases of agent risk register modules (Microsoft Purview, ServiceNow GRC, Archer, OneTrust have signalled native modules in development for 2026).
AI agent ROI calculation methodology. 90-day review cadence. Watches: (1) major model-pricing changes (Anthropic, OpenAI, Google, Microsoft) that shift input 1 materially, (2) regulatory enforcement that establishes the realistic compliance cost (input 4) for various deployment profiles, (3) emerging case studies with documented ROI realisation that allow the methodology's outputs to be benchmarked against actual enterprise records, (4) finance-function-specific ROI methodology guidance from major consulting firms (McKinsey, Bain, BCG, Deloitte) that may shift the methodology baseline.
Retail and logistics agentic AI patterns. 90-day review cadence. Watches: (1) FTC enforcement actions on algorithmic pricing (the FTC has signalled the area as a priority and the first major settlement could come in 2026), (2) major retail-AI public reversals (the Klarna pattern recurring at other Fortune 500 retailers would establish a stronger precedent), (3) state consumer-protection law amendments specifically addressing AI-mediated retail (California AB 3030 has retail-AI provisions; other states are following), (4) supply-chain disruptions producing high-profile failures of forecasting-agent deployments.
Public-sector agentic AI procurement constraints. 90-day review cadence. Watches: (1) the OMB M-24-10 successor framework (post-Executive-Order-14110 federal AI guidance is actively evolving), (2) FedRAMP framework updates including the AI-specific authorisation provisions in development, (3) state-level AI procurement laws (Colorado, Utah, Texas, California, Washington) that establish state-specific procurement bars, (4) the NIST AI Safety Institute's outputs that increasingly serve as de facto federal procurement criteria, (5) emerging case-law on public-sector AI deployment liability.
HIPAA-compliant healthcare agentic AI playbook. 60-day review cadence given active OCR enforcement environment. Watches: (1) OCR enforcement actions specific to AI-related HIPAA cases (the first major settlement under the AI overlay is expected in 2026), (2) HHS guidance on AI-specific HIPAA implementation (the 2024 NPRM on the HIPAA Security Rule includes AI-relevant language; the final rule is expected in 2026), (3) state-level health-AI laws (California AB 3030 and others) that overlay onto HIPAA, (4) vendor BAA template revisions specifically for agentic AI workflows.
AI agent contract exit clauses. 90-day review cadence. Watches: (1) emerging case-law on AI vendor contract disputes that establishes precedent for specific clause language, (2) major-vendor template updates that shift the negotiation baseline (Microsoft, Anthropic, OpenAI, Google enterprise template revisions are watched closely by procurement counsel), (3) industry-standard template publishers (the IACCM Contract Standards Group, the IAPP, sector-specific procurement consortia) publishing AI-agent-specific exit-clause language, (4) regulatory guidance under EU AI Act Article 26 (deployer obligations) that may codify some of the eight provisions as compliance requirements rather than negotiation choices.
Centralised vs federated AI governance organisational design. 90-day review cadence. Watches: (1) Fortune 500 organisational design announcements that shift the dominant pattern (Chief AI Officer org design at large enterprises is still actively forming; expect 1-2 high-profile public reorganisations per quarter in 2026), (2) regulatory enforcement actions that establish a documentation consistency bar that purely federated models cannot meet, (3) consulting industry reports (McKinsey, Bain, BCG, Deloitte) that publish patterns from their advisory engagements, (4) emerging variant models (e.g., the AI Center of Excellence model that some enterprises are positioning as a fourth option).
A2A protocol piece. 60-day review cadence given active protocol evolution. Watches: (1) A2A specification version updates and reference implementation maturity, (2) inflection in vendor support beyond the announcement-day partner set (e.g., Anthropic and Microsoft have not committed to A2A as of April 2026; their positioning may shift), (3) competing or parallel standards (Microsoft has hinted at alternative inter-agent primitives in their Copilot platform; Anthropic has internal context-isolation primitives that may or may not converge on A2A), (4) regulatory positioning (the EU AI Act's Article 9 risk-management requirements may begin to reference A2A or equivalent in 2026-2027 enforcement guidance).
Multi-agent architecture playbook. 90-day review cadence. Watches: (1) the A2A (agent-to-agent) protocol's adoption trajectory through 2026 (claim AM-050 covers in detail), (2) Anthropic Managed Agents and OpenAI Operator's evolving multi-agent primitives, (3) emerging case-law and regulatory guidance specific to multi-agent failure attribution (currently underdeveloped; expect first major precedent in 2026-2027), (4) MCP (Model Context Protocol) adoption that affects how broker-mediated patterns get implemented.
NIST AI RMF mapping. 90-day review cadence. Watches: (1) NIST AI RMF version updates (NIST has signalled an AI RMF 2.0 framework revision in development for late 2026), (2) Generative AI Profile updates (the July 2024 profile is the current authoritative addendum; further profiles for agentic systems specifically are expected), (3) U.S. federal procurement guidance that elevates NIST AI RMF from voluntary to operational (pending under the post-Executive Order 14110 successor framework), (4) NIST AI Safety Institute outputs that revise the technical risk taxonomy.
Head of AI Governance role specification. 60-day review cadence given active market formation. Watches: (1) Forrester / Gartner / IDC role-tracking data revisions, (2) major-enterprise role announcements that shift compensation benchmarks (the 2026 cohort of Chief AI Officers at Fortune 50 enterprises will set the C-level compensation precedent), (3) emerging variant titles that consolidate or fragment the accountability set (Chief Responsible AI Officer, Chief AI Risk Officer, AI Governance Committee Chair are early variants), (4) regulatory frameworks (EU AI Act Article 14 human oversight, U.S. state AI laws naming-an-accountable-individual provisions) that codify or shift the role's legal exposure.
Article 12 audit-evidence template specification. 60-day review cadence given active regulator guidance development. Watches: (1) European AI Office guidance on Article 12 specifically (the Office's first detailed enforcement guidance is expected in Q3 2026 ahead of the August enforcement window), (2) Member-State-level retention period clarifications (Germany BfDI and France CNIL have already issued sector-specific guidance that extends the retention floor in some contexts), (3) ISO/IEC 42001 update that may formalise a parallel record-keeping standard, (4) vendor-platform native support for the 14-field structure (Microsoft, Anthropic, OpenAI, Google all have partial implementations as of April 2026).
EchoLeak / cross-agent prompt-injection class analysis. 60-day review cadence given the active research front. Watches: (1) new CVEs in the cross-agent prompt-injection class (multiple research groups are actively probing major agent platforms; expect 2-4 additional public CVEs in 2026), (2) vendor-side architectural responses (Microsoft's post-EchoLeak hardening, Anthropic's Managed Agents context-isolation primitives, OpenAI's Operator sandboxing), (3) regulator response under EU AI Act Article 15 (cybersecurity provisions) which is likely to formalise the cross-agent prompt-injection class as a foreseeable risk by Q4 2026.
Six-case agent failure case-study analysis. 90-day review cadence. All cases are publicly documented in primary sources (Civil Resolution Tribunal decision, The Markup investigation, public X/LinkedIn posts by founders and engineers, mainstream UK news coverage). Watches: (1) new high-profile incidents that establish additional failure modes beyond the three documented, (2) updates to the legal record (the Air Canada Civil Resolution Tribunal decision is the highest-leverage precedent for agent-binding doctrine and remains under-litigated in 2026), (3) vendor-side public statements that revise the documented record (e.g., Replit's response to the database-wipe incident has shifted vendor disclosure norms).
OWASP Agentic AI Top 10 enterprise walkthrough. 90-day review cadence. Watches: (1) revisions to the OWASP Agentic Security Initiative threat catalogue (active project, version revisions expected through 2026), (2) new threat classes added to the catalogue (e.g., agent-communication poisoning in multi-agent systems is an emerging T11 candidate), (3) regulatory enforcement actions that establish case-law-equivalent guidance on which threat classes constitute negligence under the EU AI Act.
10-question agentic AI readiness diagnostic. 60-day review cadence. Watches: (1) methodology changes to the Stanford Digital Economy Lab cohort identification or McKinsey AI-high-performer definition that would shift the cohort thresholds, (2) regulatory enforcement that materially changes the bar for any individual question (especially Q5 audit evidence and Q9 multi-jurisdiction posture), (3) major IAM platform releases (Okta, Microsoft Entra) that change the practical answerability of Q1 (non-human identity), (4) governance role market data revisions that change Q10 (named accountable individual).
Procurement playbook claim is scoped to enterprise agentic AI procurement specifically. The six-stage sequence is portable to adjacent procurement categories (data platforms, observability stacks) but is not optimised for them. 60-day review cadence. Watches: (1) major changes to any of the four constituent frameworks (build-vs-buy criteria, the 60-question RFP, GAUGE dimensions, vendor landscape), (2) regulatory enforcement that materially changes the documentation bar at any stage, (3) procurement-platform vendors that ship native integration of any combination of the constituent frameworks (would compress engineering work substantially).
Aggregate state-of-the-year claim drawing from approximately 60 specific source claims tracked elsewhere on the ledger. 60-day review cadence aligned with the EU AI Act enforcement window opening 2 August 2026. Watches: (1) early enforcement actions after 2 August that revise the practical compliance bar, (2) major repricing or model-tier changes at Anthropic, OpenAI, Google, or Microsoft, (3) accelerated convergence between the bimodal cohorts driven by IAM platform releases (Okta, Microsoft Entra, Ping) shipping native agent-NHI primitives, (4) regulatory actions in the United States (state AI laws, OCR enforcement spike) that change the multi-jurisdictional compliance posture.
Claim is scoped to enterprise procurement of agentic AI platforms in 2026. The four credible plays are based on observed market share, enterprise reference customers, and platform completeness. Smaller specialised vendors (Cohere, Mistral, others) compete on specific verticals or use cases but do not currently meet the platform-completeness bar for general enterprise agentic AI procurement. 60-day review cadence. Watches: (1) major repricing or model-tier changes at any of the four vendors, (2) regulatory enforcement actions that materially affect one vendor's enterprise-suitability profile, (3) entry of a credible fifth platform (most plausibly via the Linux Foundation Agentic AI Foundation member firms or via a major systems-integrator-backed neutral platform).
Claim is scoped to enterprise procurement decisions in 2026. The technical specification of MCP itself is stable and not in dispute. The procurement framing of MCP as a binary adoption question is structurally inadequate for environments where developer tools, productivity SaaS, and agent platforms ship MCP support without uniform IT governance review. 60-day review cadence. Watches: (1) Linux Foundation Agentic AI Foundation governance decisions on MCP that change the protocol's enterprise-suitability profile, (2) major vendors that lock down MCP server connections behind enterprise-admin approval (currently most do not), (3) emergence of MCP-server allow-lists or governance directories shipped at the IAM platform layer.
Claim is scoped to enterprise environments running standard IAM stacks (Okta, Microsoft Entra, Ping, ForgeRock, JumpCloud, or comparable). Smaller environments and identity-greenfield deployments may have different optimal paths. 60-day review cadence. Watches: (1) IAM vendor releases that ship native agent-NHI primitives at the platform layer (Okta for AI Agents launched 30 April 2026 is the bellwether; Microsoft Entra and Ping have signalled comparable releases), (2) regulatory enforcement actions where the in-scope finding was an inadequate NHI control on an AI agent, (3) emergence of standards (NIST AI RMF revisions, ISO/IEC, OWASP Agentic AI Top 10) that explicitly define agent NHI obligations.
Claim is scoped to enterprise environments, where the configuration-shift pattern is dominant. Smaller organisations and individual-contributor environments still see substantial unsanctioned-tool shadow AI of the 2024 shape. 60-day review cadence. Watches: (1) major vendors that lock down Custom GPT / Copilot custom agent / MCP configuration behind enterprise-admin approval (currently most do not), (2) regulatory enforcement actions where the in-scope deployment was a configuration shift on an approved tool rather than a new tool, (3) enterprise-IAM platforms that ship native non-human-identity discovery for AI agents.
Claim is scoped to enterprise agentic AI deployments specifically, not to AI systems broadly. The Act's full text covers many provisions outside agentic AI scope; this piece narrows to the operational obligations that bind a typical enterprise agentic deployment in 2026. 60-day review cadence. Watches: (1) Commission delegated acts that further define Annex III categories or add new high-risk categories, (2) the first published EU enforcement actions against agentic AI deployments after 2 Aug 2026, (3) Member-State implementations that diverge on enforcement intensity, (4) any extensions or postponements of the August 2026 deadline (none currently signalled).
Claim is scoped to enterprise procurement decisions in 2026. Vendors are actively blurring the distinction in marketing — the line between 'assistant with tool use' and 'agent with bounded scope' has narrowed technically but is still procedurally distinct because of who signs the approval, what audit evidence is required, and what blast radius is being underwritten. 60-day review cadence. Watches: (1) regulatory frameworks that explicitly define one or both categories with operative legal effect (EU AI Act delegated acts especially), (2) major vendors collapsing the product naming, (3) NIST AI RMF revisions that adopt or reject the distinction.
Claim is scoped to how the figure is interpreted, not to whether the survey itself is sound. Survey methodology is competent for what it measures (self-reported strategic attribution by senior leadership). The leak is in the downstream citation chain. 60-day review cadence. Watches: (1) audited reproductions of the figure under third-party measurement, (2) McKinsey State of AI 2026 successor publication, (3) revisions to the McKinsey methodology that narrow the EBIT-attribution definition.
First piece in planned vertical-industry series. Cluster G anchor. 60-day review cadence. Watches: (1) major ESA (EBA/ESMA/EIOPA) publishing agentic-AI-specific guidance, (2) DORA or EU AI Act enforcement action redefining liability-transfer boundaries, (3) industry-body vendor contract templates closing DORA third-party-risk gap.
Third of three claim-archive signature pieces (after AM-029 Stanford 88% and AM-030 McKinsey 23%). 60-day review cadence. Watches: (1) frontier model crossing 50% on TheAgentCompany without corresponding deployment-pattern change, (2) cross-enterprise analyses showing capability-wait deployments equivalent to governance-discipline deployments, (3) benchmark refresh shifting the easy/medium/hard distribution such that more of the enterprise task space lands in the viable scope envelope.
Based on April 2026 corpus review of published governance-framework deployments + post-cutover analysis of the 88% failure rate (Stanford DEL ACA-2026-003), the 28% I&O pay-off rate (Gartner ANA-2026-002), and the 40% projected cancellation rate (Gartner ANA-2026-001). 60-day review cadence with explicit watches on (a) cross-enterprise studies testing dimensional scoring's predictive power, (b) analyst firms adopting similar instrumented-dimension models, (c) regulatory frameworks evolving to score deployment quality vs only classify risk tier.
Based on Google's 10 Apr 2026 rollout (8 markets, 8 partner platforms), Semrush + ppc.land + WinBuzzer coverage, the OpenTable/Reserve-with-Google integration pattern. Review cadence is 60 days with explicit watch on whether a second vertical agentic-search rollout lands before end-2026.
Based on Stanford DEL 2026 bimodal distribution (12%/88%), Gartner Q1 2026 28% pay-off rate, OneReach 2026 171% average, Futurum 71% operational median vs 40% high-automation. Anthropic AP-processing + Salesforce tier-1 support + Microsoft Copilot-Dynamics as back-office case anchors. 60-day review for counter-evidence watch.
Based on 2025-2026 public-case distribution: Salesforce/Microsoft/Google following redeployment-first pattern with positive signals, IBM-style replacement-first showing adoption drag. Stanford DEL 2026 + Gartner Q1 2026 as analytical anchors. 60-day review cadence because workforce-transition frames can shift quickly with any major public reversal.
60-day cadence because the Gartner Q2 I&O update lands inside the window. Secondary interpretation (that Q1 governance frameworks are shaped by EU AI Act compliance requirements first and threat-model completeness second) is reviewable alongside the primary claim.
Claim created at publish; review in 30 days — pricing-tier claims are highly time-sensitive. Verify $200/month Pro tier availability and Claude Opus comparison pricing monthly.
Claim created at publish; review in 60 days. Re-verify Carnegie Mellon agent-completion benchmark + IDC $3.50 ROI number against next round of publications.
Based on Gravitex 87%/27% split, LuckiWi's 82% of Fortune 100 using Six Sigma, Gartner's 7 Apr 2026 finding that 57% of failed I&O deployments cited 'too much too fast'. Claim reframes the causal arrow: the pre-built measurement environment is what matters, Six Sigma is one path that produces it.
Based on Stanford DEL's 2026 playbook (51 deployments), OneReach 171% average + Futurum 71% median productivity vs 40% high-automation, Gartner's 28%-pay-off finding on the 88% side. Watches for benchmarks that show the distribution tightening around the mean or counter-evidence of IT-led 300%+ deployments.
Based on the 2026 case-study spread (47-facility global manufacturer at 42% downtime reduction, pharma at 30% in six months, industry median 25-30%). Watching for a parallel-log deployment clearing 30% sustained over 12 months.
Each claim links to the piece it came from and the review cadence Peter set when publishing it. How this works →
Affiliate firewall
Vendors we will and won't affiliate-link
The publication earns affiliate commission from a subset of vendors it covers. The rule: never affiliate-link a vendor whose tracked claim has been Partial or Not holding. The rule is enforced in code — the build fails if a blocked vendor link slips through. This panel is the public face of that firewall. See /disclosures/ for the full editorial framework.
Vendor
Status
Why
n8n
ai-tooling-subscription · operators
Eligible (no audited claim)
No tracked claims about "n8n" yet.
Affiliate program not yet enrolled — no commission earned today.
Anthropic
ai-tooling-subscription · enterprise / operators
Eligible
All tracked claims about "Anthropic" are positive.
Affiliate program not yet enrolled — no commission earned today.