The four credible 2026 agent-evaluation platforms (DeepEval, Braintrust, LangSmith, Patronus AI) do not compete on capability rank; each fits a distinct deployment shape (engineering-led eval-as-code; SaaS-first eval-as-product; LangChain-stack-native bundled with observability; research-grade hallucination + simulation), and picking by capability matrix produces the wrong procurement outcome for most enterprises. The structurally load-bearing eval-vs-observability split (companion piece AM-123) compounds this: 'is the agent right' and 'what did the agent do' are different procurement decisions answered by different platforms.
Procurement-first deep-dive on the 2026 agent-evaluation tooling category. Verified primary sources: LangChain State of Agent Engineering 2025 (n=1,340, surveyed 18 Nov-2 Dec 2025) cross-validated by McKinsey State of AI Nov 2025 (n=1,993 across 105 nations); DeepEval v3.9.9 release notes (1 Dec 2025, 15.1k stars); Braintrust public pricing (Starter $0/Pro $249/Enterprise on-prem); LangSmith pricing (Developer $0/Plus $39/Enterprise hybrid+self-host) plus published HIPAA/SOC2 Type 2/GDPR posture; Patronus AI homepage (frontier-lab repositioning). Editorial finding: brief's 64% Anaconda+Forrester eval-blocker stat doesn't resolve to a verifiable primary source; substituted LangChain n=1,340 32% quality-as-blocker figure cross-validated by McKinsey. Patronus AI repositioned in 2026 from hallucination specialist (brief framing) to 'frontier lab developing simulation research and infrastructure'; piece surfaces this as a tracked vendor pivot rather than smoothing it. 60-day review cadence because vendor pricing and product positioning churn quarterly in this category.
/holding/AM-122/Embed this claimiframe + oEmbed
The card auto-updates when the claim's status, last-reviewed date, or correction log changes. Embedders never need to refresh — the card is rendered live from the canonical record.
About this register
The Reporting register tracks claims published from articles addressed to senior enterprise IT leaders — CIOs, IT directors, heads of platform. Claims are reviewed on a 30–90 day cadence; each review either reaffirms the claim, marks one substantive part as Partial, or marks it Not holding once the underlying evidence has been overtaken.
Recent corrections in Reporting
- AM-002 · Not holding · 06 May 2026
URL state changed. The /the-agentic-ai-revolution-real-world-success-stories-and-strategic-insights-from-2024-2025/ slug now serves a deliberately rewritten retrospective (claimId AM-130, "Agentic AI 2024-2025 retrospective", published 04 May 2026) against audited primary sources. The 28 Apr 2026 redirect to /retractions/ has been lifted to allow that. AM-002 the claim remains Not holding — the original $3.50/dollar + 70% failure-rate framing was withdrawn and is not restored. AM-130 is a separate claim with its own evidence chain. Readers arriving at /holding/AM-002 see the withdrawal here; the article link surfaces the new piece at the URL the original lived at, with this entry as the audit trail.
- AM-121 · Holding · 2 May 2026
Klarna walk-back primary-source upgrade — added Siemiatkowski verbatim quotes via Bloomberg-cited-by-Fortune (9 May 2025) and the Uber-style freelance hiring detail via Entrepreneur. Closes the highest-priority evidence gap from the source dossier.
- AM-115 · Holding · 29 Apr 2026
Initial publication 29 Apr 2026 — the first Quarterly Claim Review Bulletin. The claim itself is recursive: it asserts that the bulletin will ship quarterly, and the next review (30 Jul 2026) tests whether the Q3 bulletin actually appeared. Status starts as 'up' because the claim is currently true (the Q2 bulletin shipped). The verdict at end of July 2026 will move to Holding, Partial (bulletin shipped but on a delayed cadence), or Not holding (no bulletin shipped). REVIEW: Peter — please verify claim text + cadence wording before removing rewriteInProgress flag.
Reviews coming up in Reporting
- AM-003 · Holding · next +5d (19 May 2026)
GPT-5 Pro's tiered-subscription model forces enterprises to classify problems by computational difficulty — $200/month…
- AM-136 · Holding · next +21d (4 Jun 2026)
Across the 24-month window May 2024 to April 2026, every major foundation-model provider (Anthropic, OpenAI, Google, AW…
- AM-020 · Holding · next +35d (18 Jun 2026)
The 40-60% TCO underestimate on enterprise agentic-AI deployments is not a cost-visibility failure — it is a cross-depa…