Skip to content
Holding·last review10 May 2026

The Firefox 150 / Claude Mythos disclosure (November 2025) marks the operational shift in agentic AI code auditing from 'AI can find bugs' (true since 2023, but blocked from production CI by the false-positive rate that earlier read-only GPT-4 / Sonnet 3.5 attempts produced) to 'agentic verification clears the false-positive wall by building and running its own test cases before reporting'; the procurement-deck consequence is that CI-time agentic auditing becomes the default expectation for any shipping enterprise software in 2026, and three derived questions belong in any software-vendor procurement (does the vendor's CI pipeline include an agentic-auditing step; what is the vendor's disclosure posture when bugs are found in their own product by agentic tools; what is the vendor's posture on the dual-use risk that the same pipeline architecture works in reverse, as the reported Anthropic investigation of unauthorized Mythos use via a third-party vendor environment makes explicit).

Claim created at publish; review on 60-day cadence. Anchor sources: Mozilla Hacks blog post on Firefox 150 release (November 2025) covering the Claude Mythos Preview pipeline integration; Schneier on Security coverage of the disclosure; The Decoder coverage including the 15-year-old use-after-free in the <legend> element as the canonical combinatorial-reasoning anchor; SecurityWeek coverage including the Mozilla CTO calibration quote ('elite-human-quality discovery at machine throughput, not superhuman discovery'); CSO Online reporting on Anthropic investigation of unauthorized Mythos use via third-party vendor environment; flyingpenguin.com counter-narrative critique flagged in Schneier comments arguing the '271 zero-days' headline overstates the strict-zero-day count. Methodology caveat: the Firefox 150 release notes individually credit only 3 bugs as 'found with Claude' (two use-after-free, one invalid-pointer-in-wasm); the 271 total flows through rollup CVEs (CVE-2026-6784, 6785, 6786 totalling 316 internally-found bugs), so per-bug attribution at the public CVE level is much smaller than the aggregate. Sister claims: AM-146 (vendor accuracy claims need named task + baseline + methodology; agentic-verification step is the methodology change), AM-007 (vendor-response split for cross-agent class disclosure; the same Cohort A/B framing extends to defensive disclosure of agentic-auditing CI integration), AM-009 (Anthropic Cohort A disclosure pattern for Claude for Chrome; Mozilla's Mythos disclosure follows the same shape on the consuming-vendor side), AM-130 (procurement reader's four evidence classes; Mythos sits in the 'audited customer pilots with active human oversight' class given Mozilla's published methodology), AM-140 (procurement-committee six pre-pilot questions). Trigger conditions to revisit before next cadence: (a) a major enterprise software vendor (Microsoft, Google, AWS, Salesforce, Adobe, etc.) publishes an analogous CI-time agentic-auditing disclosure with named pipeline and named bug counts — extends the named-success cohort and changes 'default expectation' framing materially; (b) a published reproduction of the Mozilla pipeline by an independent third party (academic team, security-research firm) confirming or qualifying the false-positive-wall-falls finding; (c) a public disclosure by Anthropic concluding the unauthorized-Mythos-use investigation, with concrete remediation; (d) the flyingpenguin.com strict-zero-day critique gains traction in security-research literature and reframes the disclosure scope; (e) regulatory action (EU AI Act post-market monitoring, US FTC, sectoral regulator) imposing mandatory agentic-auditing CI requirements on shipping software.

Published
10 May 2026
Last reviewed
10 May 2026
Next review
+21d· 09 Jul 2026
Embed this claimiframe + oEmbed
HTML iframe
Paste-the-URL (Substack, Medium, Notion, WordPress)

The card auto-updates when the claim's status, last-reviewed date, or correction log changes. Embedders never need to refresh — the card is rendered live from the canonical record.

Watch this claim

Email-me when AM-147's status, next review date, or correction log changes. One email per change. No newsletter subscription, no other mail.

The claim: The Firefox 150 / Claude Mythos disclosure (November 2025) marks the operational shift in agentic AI code auditing from 'AI can find bugs' (true since 2023, but blocked from production CI by the false-positive rate that earlier read-only GPT-4 / Sonnet 3.5 attempts produced) to 'agentic verification clears the false-positive wall by building and running its own test cases before reporting'; the procurement-deck consequence is that CI-time agentic auditing becomes the default expectation for any shipping enterprise software in 2026, and three derived questions belong in any software-vendor procurement (does the vendor's CI pipeline include an agentic-auditing step; what is the vendor's disclosure posture when bugs are found in their own product by agentic tools; what is the vendor's posture on the dual-use risk that the same pipeline architecture works in reverse, as the reported Anthropic investigation of unauthorized Mythos use via a third-party vendor environment makes explicit).

About this register

The Reporting register tracks claims published from articles addressed to senior enterprise IT leaders — CIOs, IT directors, heads of platform. Claims are reviewed on a 30–90 day cadence; each review either reaffirms the claim, marks one substantive part as Partial, or marks it Not holding once the underlying evidence has been overtaken.

Recent corrections in Reporting

  • AM-008 · Partial · 17 Jun 2026

    Source-text figure re-review: Google's 2024 Environmental Report reports a 28% year-over-year increase to 8.1 billion gallons, not the 33% (from a 6.1 billion 2023 base) asserted at publish. The 8.1B 2024 figure and the Microsoft WUE 0.30 L/kWh / 39%-improvement figure are unchanged and verified. Article corrected to 28% and the unsupported 6.1B base removed; the claim text retains the original figure with this correction per the Holding-up protocol.

  • AM-132 · Partial · 10 Jun 2026

    One of four legs unanchored on re-review. The claim text attributes '12% of deployments clearing 300%+ ROI with 88% at or below break-even at 12-18 months' to the Stanford DEL 2026 Enterprise AI Playbook. Full-text verification on 10 Jun 2026 found no such figure in that source: the playbook (Pereira, Graylin, Brynjolfsson, Apr 2026) studies 51 successful deployments by design and contains no ROI distribution, no 300%-plus cohort, and no break-even measurement point (full finding at AM-029, correction of 10 Jun 2026). The only verified figure carrying the same 12/88 numerals is IDC research with Lenovo (via CIO.com, Mar 2025): roughly 88% of AI proof-of-concepts never reach production and roughly 12% graduate — a pilot-to-production graduation metric, not an ROI distribution. The Gartner 28%, McKinsey 23%/17%, and MIT NANDA 95% legs verify; they support a small high-performing tail and a large struggling body, but none documents the two-peak bimodal shape the claim asserts. Status Up -> Partial.

  • AM-129 · Partial · 10 Jun 2026

    One of three read-against anchors unanchored on re-review. The claim text cites 'Stanford Digital Economy Lab Enterprise AI Playbook (12/88 bimodal ROI distribution at 12-18 months)' and frames the realistic ROI band around 'the highest-discipline 12% cohort'. Full-text verification on 10 Jun 2026 found the playbook contains no 12/88 distribution, no bimodal ROI shape, and no 12-18-month ROI measurement point (full finding at AM-029, correction of 10 Jun 2026). The claim's core negative finding — no mid-market enterprise has produced a documented +240% ROI in 90 days under audited conditions — is unaffected; the McKinsey State of AI 2025 and MIT NANDA legs verify and continue to support it. The '12% cohort' framing has no verifiable referent. The only verified figure carrying the 12/88 numerals is IDC's pilot-graduation finding (roughly 88% of AI proof-of-concepts never reach production; via CIO.com, Mar 2025), a different metric. Status Up -> Partial.

Reviews coming up in Reporting

  • AM-063 · Holding · next +9d (27 Jun 2026)

    AI agents executing financial transactions need a four-control bundle (action-approval gates by blast radius, kill-swit…

  • AM-061 · Holding · next +9d (27 Jun 2026)

    Production agentic-AI costs at scale routinely run multiples of POC projections, and a layered optimisation programme c…

  • AM-003 · Partial · next +9d (27 Jun 2026)

    GPT-5 Pro's tiered-subscription model forces enterprises to classify problems by computational difficulty — $200/month…

Referenced within Agent Mode AI by · 2 pieces