Skip to content

We only publish what we can defend in a vendor meeting. Every claim carries an ID, a review date, and a verdict you can check.

Issue 020 · Week 20 · 2026
Ledger
Status moved

Quiet — no verdict transitions in the last 30 days. See the ledger →

Agent Mode AI — claim-tracked agentic AI analysis

Newest · Risk & Governance

The Samsung lesson for shadow AI: detection lag is structural, not procedural

Samsung Electronics restricted ChatGPT and other generative AI on company devices in May 2023, after three separate internal incidents in April where employees pasted confidential source code, meeting transcripts, and yield-test code into the public ChatGPT interface. The detail in the public reporting is the load-bearing point. Samsung found the leaks after the fact, by audit, not by detection at the moment the paste happened. The detection lag was not a Samsung-specific operational failure. It was the predictable output of running enterprise data-loss prevention against a category of egress channel the controls were not built for. Three years on, most enterprise shadow-AI programmes still have the same gap.

Read the piece →·Written by Claude, signed by Peter
Signed by
Peter

27 years enterprise IT operations. Global organisation. Major incidents. Editorially independent.

  • 103pieces
  • 171tracked claims
  • 14public retractions
About the editor
Framework · GAUGE

The Enterprise Agentic Governance Benchmark. Six dimensions, scored 0–100. Free 5-minute web diagnostic; 30–45 minute Excel for governance groups.

Score a deployment →
Holding-up · Ledger
Every claim, tracked.
171tracked claims
Most recently reviewed: AM-141Holding
Read the ledger →
Bulletin · Reviews
Quarterly verdict bulletin.
2issues published
Latest: Q3 2026 Claim Review Bulletin: which claims moved, which held, and what the EU AI Act enforcement window did to the corpus
Read the latest →
Podcast · Audio companion
Two analysts, one claim per episode.
13episodes live
Latest: Why IT operations is the highest-exposure agentic-AI workforce population · 12:25
All episodes →

Recently reviewed

Three claims most recently re-tested against their primary sources. Status changes log to the corrections page; nothing quietly vanishes.

See the full ledger →
  1. AM-133HoldingQ3 2026 Claim Review Bulletin: which claims moved, which held, and what the EU AI Act enforcement window did to the corpusReviewed 30 Jul 2026Read article →
  2. AM-156HoldingThe Samsung lesson for shadow AI: detection lag is structural, not proceduralReviewed 16 May 2026Read article →
  3. AM-155HoldingStorm-0558 and the structural risk in AI agent credentialsReviewed 16 May 2026Read article →
Method · Holding-up

Why this publication has a ledger

Most AI commentary gets paid for being loud about what's new. Almost none gets measured on whether what it said last quarter still holds this one. That is the gap this publication exists to close. Every published argument carries an ID, a review date, and one of three verdicts — Holding, Partial, or Not holding — that updates over time as evidence accumulates. The verdict log is the product.

When a claim stops holding, the page says so. The original sentence stays visible. The correction is dated and appended. Nothing is quietly removed. You do not need to trust the author to trust the verdicts — the receipts are public, on a 30–90 day review rhythm, and the corrections record is permanent.

Two registers

Same Holding-up discipline
Enterprise IT · default
For CIO / CISO / head of platform.

Mid-market and large enterprise. Procurement, governance, EU AI Act, multi-vendor agentic stacks. 30–90 day claim review cadence.

103enterprise articles
Start here →
Operators · sibling
For solo founders to ~50-person teams.

No IT department. Practitioner-advisory voice; faster 30–45 day cadence. Tools, vendor red flags, hours-per-week evaluation budgets.

48operators articles
Operators →

Topic pillars

Five clusters

Editor's picks

One per topic cluster

Latest pieces

Full archive →
Use Cases

Public-sector agentic AI procurement: what the GSA and EU records show

Federal and EU member-state agentic AI contract records show renewals running materially below the enterprise SaaS benchmark. The driver is not technical performance but audit-evidence completeness under OMB M-24-10 §5 and EU AI Act Article 12. The procurement implication is structural.

13 min
Latest AI Developments

Enterprise agentic AI in Q2 2026: what shipped, what slipped, what held

Of 8 major enterprise agentic AI vendor claims from Q1 2026, a minority are Holding at 90-day review. The pattern that predicts durability is not vendor size. It is whether the ROI evidence came from a customer or from the vendor itself.

12 min
Use Cases

Agentic AI in legal services: what survives the billable-hour decomposition

Three of the six billable-hour sub-tasks capture durable value with agentic AI. Two increase malpractice risk vs a junior-associate equivalent at the same time-to-delivery. One is bounded by conduct rules, not technology. The evidence from AmLaw 100 deployments now allows a clear-eyed breakdown.

12 min
Understanding AI

The agent fan-out problem: when one prompt becomes 400 LLM calls

Production agentic systems amplify a single user request into dozens or hundreds of internal LLM calls. Most enterprise unit-economics, latency budgets, and observability setups are still priced for 1:1.

10 min
Business Case & ROI

The split verdict: GPT-5.5 vs Claude Opus 4.7 and why CIOs need two models, not one

Anthropic shipped Claude Opus 4.7 on 16 Apr 2026; OpenAI shipped GPT-5.5 seven days later. Both vendors claim leadership. Neither model wins everything. The procurement question for 2026 is not which one to standardise on, because the evaluation evidence does not support a single-model answer for any enterprise running both agentic-coding workloads and knowledge-work workloads. The two-year procurement decision is whether to plan the routing or accept the tax of pretending it does not exist.

17 min
Risk and Governance

Agentic code auditing: what the Firefox Claude Mythos disclosure tells procurement about CI-time defaults

Mozilla's Firefox 150 release (November 2025) shipped fixes for 271 vulnerabilities surfaced by the Claude Mythos Preview pipeline. The headline fact ('AI found 271 bugs') is true but is not the procurement-relevant one. The procurement-relevant change is that the agentic-verification step (the agent builds and runs its own test cases to triage suspected bugs before reporting) cleared the false-positive wall that blocked earlier read-only GPT-4 / Claude Sonnet 3.5 attempts from production CI. CI-time agentic auditing becomes the default expectation for any shipping enterprise software in 2026, with three derived procurement-deck questions and one dual-use risk surfacing alongside the defensive disclosure.

12 min
Business Case & ROI

Agentic AI accuracy claims: the three questions every CIO should ask before 'ready-to-run' becomes a procurement decision

Anthropic posted a launch this week positioning the product as 'ready-to-run'. The phrase is procurement-deck noise unless three questions are answered: accuracy rate on which task, against which baseline, measured by what methodology. The 2026 industry baseline for procurement-credible accuracy disclosure is the academic-benchmark pattern (CRMArena-Pro 35% multi-step reliability on a defined CRM task corpus; CMU TheAgentCompany 30-35% reproduction range; WebArena ~36% browser-agent ceiling) and the vendor-disclosure pattern Anthropic itself established earlier (Claude for Chrome 23.6% → 11.2% → 0% with named attack corpus and patch cadence). Vendor 'ready-to-run' positioning that doesn't meet either bar leaves the deploying enterprise inheriting the methodology gap as an audit-defense burden.

13 min
AI Implementation

What is Agent Mode? Microsoft, Cursor, GitHub Copilot, and OpenAI in 2026

Agent Mode is the same brand-name shipping in three different product classes in 2026: Microsoft 365 Copilot productivity-suite agents, Cursor IDE agents, and GitHub Copilot code-platform agents. Procurement teams comparing them feature-by-feature are comparing categories that aren't substitutes.

15 min

Browse by topic pillar

Five strategic pillars

Coming next

Peter's editorial calendar — honest dates, bumped-with-notes if missed.
  1. Week 17
    26 Apr 2026
    Non-human identity — the first procurement question CIOs aren't asking yet

    Every enterprise agent deployment passes through a credential. Most teams still hand the agent a human's credential. Naming the NHI gap is the next Q2 procurement conversation.

  2. Week 18
    03 May 2026
    Shadow agent sprawl — what telemetry catches and what it misses

    The browser-as-agent-runtime pattern creates a detection gap that MDM/CASB don't see. What the first wave of shadow-AI discovery tools actually find, and the three categories they miss.

  3. Week 19
    10 May 2026
    The AI agent MSA — four clauses every enterprise contract needs by August

    EU AI Act enforcement activates 2 Aug 2026. The clauses that survive legal review in the next quarter will be the ones that don't pretend the agent is conventional SaaS.

Vigil · 15 reviewed