ACA-2026-004AcademicPending review

Carnegie Mellon's TheAgentCompany 2026 update shows Gemini 2.5 Pro as the best enterprise-agent at 30.3% task completion — up from the 24% Claude 3.5 Sonnet baseline in 2024, but still far below production-readiness thresholds.

Source: Carnegie Mellon University · archived
Academic published: 2026-03-15
Logged: 2026-04-19
Cadence: 90 days
Next review: 2026-07-18

Why this was logged

TheAgentCompany is the most-cited academic benchmark for enterprise-agent task completion. The 30.3% best-in-class rate is the empirical anchor for every "agents aren't ready" analysis.

Review history

No reviews yet. First review scheduled for 2026-07-18.

This record tracks what the source stated, with evidence for the current verdict. Verdicts describe what the evidence shows, not vendor intent. See methodology for the full counter-evidence + review discipline.

Vigil · 09 reviewed