Skip to content
Holding·last review17 May 2026

The widely-cited 95-percent generative-AI-pilot-failure framing (MIT Sloan Management Review and Boston Consulting Group adoption-research streams, 2025-2026) is methodologically defensible for the enterprise cohort the research sampled (large firms with dedicated AI functions, 12-to-18-month evaluation windows, scaled-production-deployment success definition) and materially misrepresents small-firm pilot dynamics. The 1-to-50-person operator cohort has a different failure-mode catalogue (tool-assigned-to-wrong-person, rewrite-cost-exceeds-savings, client-rework-from-AI-deliverable, line-item-stack-compounded-and-cancelled, sporadic-use-no-routine) and a different success definition (90-day payback at actual hourly rate; deliverable quality reaching the client without disproportionate rework; routine fit documented for handover). A three-question Monday-morning small-firm pilot test (payback, deliverable quality, routine fit) checked at 30 days and 60 days is the operator's actual evaluation instrument and replaces the enterprise 12-to-18-month evaluation cycle that the 95-percent number is measured against.

Operators register pillar piece on pilot-evaluation framework for 1-to-50-person firms. 45-day cadence calibrated so the first review falls within the typical 30-to-60-day pilot decision window. Trigger conditions for status changes: (1) MIT Sloan, BCG, or a comparable adoption-research stream publishing small-firm-specific (1-to-50-person) pilot-failure data inside the review window (would either confirm the structural argument that the enterprise framing misclassifies, or refine the small-firm failure-mode catalogue with new evidence — keep Holding either way unless the evidence directly contradicts the load-bearing claim); (2) a small-firm operator survey from Stripe Atlas, Brex, Ramp, or equivalent SMB-spend-and-adoption data publishers producing pilot-outcome data at the cohort level that materially diverges from the five failure modes listed (would refine the catalogue and could move toward Partial if a sixth or seventh mode dominates the new data); (3) a major foundation-model provider publishing operator-cohort case studies with attributable revenue impact at the 1-to-50-person scale (would harden the success-definition argument by establishing what small-firm pilot success looks like in the public record); (4) a viral re-citation of the 95-percent number with new methodology that does include small firms (would update the source-document discussion in the piece). Sibling claim: AM-146 (three questions for CIOs about agentic AI accuracy claims) addresses the analogous misread problem in the enterprise cohort.

Published
17 May 2026
Last reviewed
17 May 2026
Next review
+13d· 1 Jul 2026
Cohort
1-50 person services firm, agency, or solo operator running an AI pilot without a dedicated AI function or enterprise-style procurement cycle
Cadence
45-day (pilot decision window); 60-day at follow-up review
Sibling claim
AM-146Agentic AI accuracy claims: three questions for CIOs
Embed this claimiframe + oEmbed
HTML iframe
Paste-the-URL (Substack, Medium, Notion, WordPress)

The card auto-updates when the claim's status, last-reviewed date, or correction log changes. Embedders never need to refresh — the card is rendered live from the canonical record.

Watch this claim

Email-me when OPS-069's status, next review date, or correction log changes. One email per change. No newsletter subscription, no other mail.

The claim: The widely-cited 95-percent generative-AI-pilot-failure framing (MIT Sloan Management Review and Boston Consulting Group adoption-research streams, 2025-2026) is methodologically defensible for the enterprise cohort the research sampled (large firms with dedicated AI functions, 12-to-18-month evaluation windows, scaled-production-deployment success definition) and materially misrepresents small-firm pilot dynamics. The 1-to-50-person operator cohort has a different failure-mode catalogue (tool-assigned-to-wrong-person, rewrite-cost-exceeds-savings, client-rework-from-AI-deliverable, line-item-stack-compounded-and-cancelled, sporadic-use-no-routine) and a different success definition (90-day payback at actual hourly rate; deliverable quality reaching the client without disproportionate rework; routine fit documented for handover). A three-question Monday-morning small-firm pilot test (payback, deliverable quality, routine fit) checked at 30 days and 60 days is the operator's actual evaluation instrument and replaces the enterprise 12-to-18-month evaluation cycle that the 95-percent number is measured against.

About this register

The Operators register tracks claims published from practitioner-advisory pieces addressed to solo founders, micro-SMB, and small businesses up to around fifty people. Claims are reviewed on a 30–45 day cadence — tooling and SMB-relevant pricing shift faster than enterprise procurement signals.

Recent corrections in Operators

  • OPS-068 · Partial · 17 Jun 2026

    Source-text re-review: the '$300-$500 (2024) toward $100-$130 (early 2026)' median trajectory is not stated in either cited source — the Godberry Studios teardown reports stack cost by revenue tier (not a year-over-year median) and BetterCloud's SaaS-industry data covers enterprise spend, not solopreneur AI subscriptions. The compression direction is supported by the Godberry tier data and observable foundation-model bundling; the specific year-anchored median figures are reclassified as source:our-estimate in the article. The load-bearing claim (active compression / category-collapse) holds; status moved to Partial pending a primary source carrying a dated solopreneur-median series.

  • OPS-051 · Partial · 10 Jun 2026

    One named member of the generation cluster was already defunct at publication: Tome shut down its presentation/narrative product (Tome Slides) in March 2025 and pivoted to sales tooling, with the brand later sold to AngelList (deckary.com shutdown timeline; signalhub.substack.com post-mortem, both checked 10 Jun 2026). The generation cluster reduces to Pitch + Gamma. The two-cluster thesis itself is unaffected and arguably strengthened — the pure AI-narrative product failed to find a sustainable business while Gamma (70M users, $100M ARR as of Nov 2025) and the assembly cluster (PandaDoc, Better Proposals, Proposify per Luniq 2026 agency comparison) both compound. Status Up → Partial for the factual error in the tool list.

  • OPS-022 · Partial · 10 Jun 2026

    Vendor attribution error in the claim text. The claim names Polley Faith among 'Spellbook with named small-firm customers Westaway, KMSC Law, Polley Faith'. Polley Faith LLP is a Harvey-listed law-firm customer, not a Spellbook customer: the live Spellbook site (now spellbook.com; spellbook.legal 301-redirects) names Westaway, KMSC Law, and McInnes Cooper with no Polley Faith, and the source article's own body correctly places Polley Faith on Harvey's roster — the claim text and the article excerpt bundled it with the wrong vendor at publish. The remaining legs verify against extracted source text on 10 Jun 2026: Anthropic's GC AI customer story carries 'More than 1,500 companies' and '14 hours saved per week on average ... based on a survey of more than 100 active customers' verbatim; Harvey's published roster (Thompson Hine, Fox Rothschild, Lowenstein Sandler, Polley Faith) matches; ABA Formal Opinion 512 remains the governance baseline. The corpus reading (AI ships at 1-to-20 lawyer scale; privileged work stays on Enterprise-tier zero-retention access) is unaffected. Status Up -> Partial.

Reviews coming up in Operators

  • OPS-030 · Holding · next +9d (27 Jun 2026)

    The fastest path for an owner-operator to build practical agentic-AI competence in 2026 is the three-week build-by-ship…

  • OPS-029 · Holding · next +9d (27 Jun 2026)

    For solo founders and small teams (under ~50 people) building with AI in 2026, the build-vs-buy decision tree has inver…

  • OPS-005 · Holding · next +9d (27 Jun 2026)

    At sub-1M tokens per month (typical SMB agent volume) in 2026, the absolute dollar gap between Claude Haiku 4.5, GPT-4o…