Carnegie Mellon's TheAgentCompany 2026 update shows Gemini 2.5 Pro as the best enterprise-agent at 30.3% task completion — up from the 24% Claude 3.5 Sonnet baseline in 2024, but still far below production-readiness thresholds.
- Source
- Carnegie Mellon University · archived
- Academic published
- Logged
- Cadence
- 90 days
- Next review
- 2026-07-18
Why this was logged
TheAgentCompany is the most-cited academic benchmark for enterprise-agent task completion. The 30.3% best-in-class rate is the empirical anchor for every "agents aren't ready" analysis.
Review history
No reviews yet. First review scheduled for 2026-07-18.
This record tracks what the source stated, with evidence for the current verdict. Verdicts describe what the evidence shows, not vendor intent. See methodology for the full counter-evidence + review discipline.
The gap between 30.3% best-case completion and the production threshold (brief §6.7 implies ~95%+) is the structural capability gap that dominates enterprise deployment economics. Review at 90 days against frontier-model releases.