The four credible 2026 agent-evaluation platforms (DeepEval, Braintrust, LangSmith, Patronus AI) do not compete on capability rank; each fits a distinct deployment shape (engineering-led eval-as-code; SaaS-first eval-as-product; LangChain-stack-native bundled with observability; research-grade hallucination + simulation), and picking by capability matrix produces the wrong procurement outcome for most enterprises. The structurally load-bearing eval-vs-observability split (companion piece AM-123) compounds this: 'is the agent right' and 'what did the agent do' are different procurement decisions answered by different platforms.
Procurement-first deep-dive on the 2026 agent-evaluation tooling category. Verified primary sources: LangChain State of Agent Engineering 2025 (n=1,340, surveyed 18 Nov-2 Dec 2025) cross-validated by McKinsey State of AI Nov 2025 (n=1,993 across 105 nations); DeepEval v3.9.9 release notes (1 Dec 2025, 15.1k stars); Braintrust public pricing (Starter $0/Pro $249/Enterprise on-prem); LangSmith pricing (Developer $0/Plus $39/Enterprise hybrid+self-host) plus published HIPAA/SOC2 Type 2/GDPR posture; Patronus AI homepage (frontier-lab repositioning). Editorial finding: brief's 64% Anaconda+Forrester eval-blocker stat doesn't resolve to a verifiable primary source; substituted LangChain n=1,340 32% quality-as-blocker figure cross-validated by McKinsey. Patronus AI repositioned in 2026 from hallucination specialist (brief framing) to 'frontier lab developing simulation research and infrastructure'; piece surfaces this as a tracked vendor pivot rather than smoothing it. 60-day review cadence because vendor pricing and product positioning churn quarterly in this category.
/holding/AM-122/Embed this claimiframe + oEmbed
The card auto-updates when the claim's status, last-reviewed date, or correction log changes. Embedders never need to refresh — the card is rendered live from the canonical record.