Why is model capability not the main procurement axis anymore?

By Q1 2026, the model layer across the four major vendors converged to comparable parity for most enterprise use cases. Best-in-cluster frontier models complete around 30% of enterprise agent tasks per the Carnegie Mellon TheAgentCompany 2026 benchmark (claim ACA-2026-004). The 30% ceiling is not vendor-specific; the variance between top models on enterprise-relevant tasks is now smaller than the variance between deployment disciplines. The 88/12 bimodal ROI distribution (claim ACA-2026-003) is produced primarily by deployment governance, not by which model the deployment runs on. Procurement that treats vendor selection as a model bake-off optimises a variable that does not move outcomes much; the variables that do move outcomes are pricing, governance, and ecosystem fit.

What is the pricing-model split between vendors?

Three distinct shapes. Anthropic Managed Agents charges 8 cents per session-hour plus tokens. The session-hour is the primary unit, which makes long-running orchestration the cost-driver. OpenAI Agents SDK has no first-party runtime fee; teams running their own infrastructure pay only for tokens. Microsoft and Google price agents inside vertically-integrated platforms (Microsoft Foundry, Google Gemini Enterprise) where the agent fee is bundled with broader platform charges. Each pricing shape suits different deployment patterns: Anthropic Managed for teams that want hosted infrastructure with predictable per-session cost, OpenAI SDK for teams with existing infrastructure budgets, Microsoft and Google for enterprises already standardised on those clouds. The single most common 2026 procurement mistake is comparing per-token costs alone without accounting for the runtime and infrastructure layers.

How does ecosystem distribution affect the procurement decision?

Microsoft's Office plus Azure footprint has no near peer. For enterprises where the daily-driver productivity surface is Microsoft 365, Copilot custom agents reach end users with no per-user onboarding cost. Google's vertical integration on Workspace and Cloud is the second-largest ecosystem position. OpenAI's enterprise distribution runs primarily through ChatGPT Enterprise and through API integration in third-party SaaS. Anthropic's enterprise distribution is similar: enterprise customers via direct contract and via API integration in approved tools. The procurement implication: Microsoft and Google compete on platform stickiness; OpenAI and Anthropic compete on capability and pricing flexibility. An enterprise heavily standardised on Microsoft 365 has a procurement bias toward Microsoft Copilot agents that is rational, not lazy. The same is true for enterprises standardised on Google Workspace.

What does the right enterprise agent vendor selection look like in 2026?

Three-step procurement, not two. First: rule out vendors whose BAA, regulatory, or compliance posture does not match the enterprise's regulatory environment. This usually narrows the four to three for healthcare, financial services, or public-sector deployments. Second: choose between platform-integrated (Microsoft, Google) and platform-neutral (Anthropic, OpenAI) based on existing ecosystem standardisation. An enterprise on Microsoft 365 with no specific reason to look elsewhere should default to Microsoft Copilot agents. Third: between the remaining one or two finalists, evaluate against the GAUGE diagnostic at /gauge/, the agentic AI RFP at /the-enterprise-agentic-ai-rfp-60-questions/, and the build-vs-buy-vs-partner framework at /build-vs-buy-vs-partner-for-enterprise-agentic-ai-2026/. The actual model bake-off, if any, comes last and decides between two near-equivalent platform-and-governance choices.

Anthropic vs OpenAI vs Google vs Microsoft for enterprise agents in 2026

At a glance

Claim

The 2026 enterprise agentic AI vendor comparison reduces to four credible platform plays (Anthropic, OpenAI, Google, Microsoft), and the procurement decision between them is no longer primarily about model capability. The model layer has converged to comparable parity for most enterprise use cases. The procurement decision in 2026 is on three other axes: pricing model (Anthropic Managed Agents at 8 cents per session-hour plus tokens versus OpenAI Agents SDK at no first-party runtime fee versus Microsoft and Google's vertically-integrated platform pricing), governance and BAA posture (Anthropic's three-cloud BAA position is structurally distinct), and ecosystem distribution (Microsoft's Office plus Azure footprint has no near peer; Google's vertical integration on Workspace and Cloud is second). Treating this as a model-quality bake-off is the most common 2026 procurement mistake and produces decisions that age badly within the first 12 months.

Supporting figure

Anthropic enterprise penetration sits at 44%, up 25 percentage points since May 2025; 80% of revenue from enterprise

Date

26 Apr 2026

Verdict

Holding(AM-039)

Next review

25 Jun 2026(+60d)

By Q1 2026, the enterprise agentic AI platform market converged to four credible plays. Anthropic with Claude Managed Agents in public beta from April. OpenAI with the Agents SDK plus ChatGPT workspace agents. Google with the Gemini Enterprise Agent Platform. Microsoft with Foundry Agent Service plus Copilot custom agents. Each has platform completeness, enterprise reference customers, BAA posture, and integration depth that enterprise procurement requires (Computerworld, The agentic AI frenzy increases as more vendors stake their claims; The New Stack, Anthropic, OpenAI, Google, and Microsoft agree that the harness is the product).

Smaller specialised vendors (Cohere, Mistral, vertical-specific players) compete on specific use cases but do not currently meet the platform-completeness bar for general enterprise agentic AI procurement. The four-way comparison is the relevant procurement frame for most enterprise IT teams running this evaluation in 2026.

This piece is the operational translation of that comparison for enterprise IT. The vendor and trade-press coverage of vendor differences is competent on capability and pricing surface comparison; what is missing in the enterprise-IT register is the procurement framing that distinguishes which axes actually move the decision in 2026 from which axes most procurement teams still spend disproportionate effort on.

Two propositions structure the piece:

Model capability is no longer the primary procurement axis. The model layer across the four major vendors converged to comparable parity for most enterprise use cases by Q1 2026. The 30% task-completion ceiling per the Carnegie Mellon TheAgentCompany 2026 benchmark is not vendor-specific. The variance between top models on enterprise-relevant tasks is now smaller than the variance between deployment disciplines, which is the structural finding behind the 88/12 bimodal ROI distribution (claim AM-029). Procurement that treats vendor selection as a model bake-off optimises a variable that does not move outcomes much.
The variables that move outcomes are pricing model, governance posture, and ecosystem distribution. Each vendor competes on a different shape of these three. Anthropic Managed Agents at 8 cents per session-hour plus tokens with three-cloud BAA position and platform-neutral integration. OpenAI Agents SDK at no first-party runtime fee with platform-neutral SaaS-distribution reach. Microsoft Foundry plus Copilot custom agents at vertically-integrated bundle pricing with the deepest enterprise ecosystem footprint. Google Gemini Enterprise Agent Platform at vertically-integrated platform pricing with Workspace and Cloud distribution. The three-axis comparison is the procurement decision; the model bake-off is a tie-breaker.

The remainder of the piece walks through the pricing model split, the governance and BAA posture distinctions, the ecosystem distribution analysis, and the six-week procurement track that operationalises the decision.

The pricing model split

The four vendors have publicly converged on three distinct pricing shapes for enterprise agent platforms in 2026. The differences are not subtle, and the optimal choice differs by deployment pattern.

Anthropic Managed Agents: per-session-hour plus tokens. Anthropic launched Managed Agents in public beta in April 2026 with a pricing structure of 8 cents per session-hour for the agent runtime plus token costs for the underlying model usage (The New Stack pricing analysis). The session-hour is the primary unit, which makes long-running orchestration the cost-driver. For an enterprise running 1,000 agent sessions averaging 4 hours each per month, the runtime cost alone is 1,000 × 4 × 0.08 = 320 dollars before token costs. The pricing favours short, bounded sessions over long-running orchestration patterns. Enterprises with high per-session activity can scale predictably; enterprises with always-on agentic workflows pay more than they would on a self-hosted runtime.

OpenAI Agents SDK: no first-party runtime fee. OpenAI’s enterprise agent offering through the Agents SDK has no runtime fee. Teams running their own infrastructure pay only for token costs against OpenAI’s models. The pricing shape favours teams with existing infrastructure budgets and deployment expertise; the runtime work is the enterprise’s, not OpenAI’s. For comparable scale (1,000 agent sessions, 4 hours each), the OpenAI SDK runtime cost is whatever the enterprise’s infrastructure costs amount to, which can be lower than the Anthropic Managed equivalent for high-volume deployments and higher for sporadic use because of fixed infrastructure overhead.

Microsoft Foundry and Google Gemini Enterprise: vertically-integrated bundle pricing. Microsoft and Google each price agents inside vertically-integrated platforms. The agent fee is bundled with broader platform charges (Microsoft 365 enterprise plus Azure compute for Foundry; Google Workspace plus Google Cloud Platform for Gemini Enterprise). The pricing is harder to compare directly because the agent fee is not separable from the platform fee, and the procurement decision is usually framed as “expand existing Microsoft or Google contract” rather than “buy AI agent platform.” The implication: enterprises already heavily standardised on Microsoft or Google have a procurement bias toward those vendors that is structurally rational; enterprises that standardise on neither pay full fee for the platform plus the agent layer.

The single most common 2026 procurement mistake on this surface is comparing per-token costs alone across the four vendors. Token costs are roughly comparable; the runtime, infrastructure, and platform layers above them produce most of the cost variance. Per-session-hour modelling, infrastructure-cost modelling, and platform-fee modelling are the three required forecasts; comparing on any one alone misleads.

The BAA and regulatory governance distinction

The four vendors have meaningfully different positions on Business Associate Agreements (BAAs) and regulatory compliance posture. The differences matter most for healthcare, financial services, public sector, and other regulated environments.

Anthropic: three-cloud BAA position. Anthropic operates under BAAs with Amazon Web Services, Google Cloud, and Microsoft Azure simultaneously. As of Q1 2026 it is the only major AI model provider with this three-cloud BAA position (Ampcome, Enterprise AI Agents 2026 Mid-Year Report). For healthcare organisations, the practical effect is deployment flexibility: a healthcare enterprise can run Claude on whichever cloud their existing infrastructure standardises on while maintaining HIPAA compliance. The three-cloud position is a structural advantage for multi-cloud regulated environments.

OpenAI: BAA capability via Microsoft Azure. OpenAI’s enterprise BAA capability runs primarily through the Microsoft Azure relationship. Healthcare and regulated-industry deployments typically run OpenAI models through Azure OpenAI Service rather than directly. The functional outcome is similar to Anthropic on Azure but does not extend to AWS or GCP. Multi-cloud regulated enterprises hit constraints with OpenAI that they do not hit with Anthropic.

Google: BAA capability via Google Cloud. Google’s BAA covers GCP-resident AI workloads. Enterprises already on GCP have a clean BAA path through to Gemini Enterprise. Cross-cloud deployment requires architectural workarounds, similar to OpenAI’s position on Azure.

Microsoft: BAA capability via Azure. Microsoft’s enterprise customers running Foundry plus Copilot custom agents on Azure have a single-cloud BAA path. The position mirrors Google’s on GCP and is the largest single ecosystem in regulated industries by enterprise count.

The procurement implication: regulated multi-cloud environments rule out single-cloud BAA vendors at the regulatory step before the rest of the comparison runs. The earlier this rule-out happens in the procurement process, the less wasted evaluation effort downstream.

The ecosystem distribution dimension

The four vendors have substantially different distribution shapes for enterprise reach.

Microsoft: deepest enterprise integration. Microsoft’s Office plus Azure footprint has no near peer. For enterprises where the daily-driver productivity surface is Microsoft 365, Copilot custom agents reach end users with no per-user onboarding cost (Josh Bersin, Could Microsoft Win The War For Enterprise AI?). The integration extends to Teams, SharePoint, Outlook, and the Azure platform. The footprint creates a procurement bias toward Microsoft for enterprises already standardised there that is rational rather than lazy: switching costs from Microsoft to a non-Microsoft enterprise agent platform are substantial.

Google: vertically integrated on Workspace and Cloud. Google’s distribution runs through Google Workspace (the second-largest enterprise productivity suite) and Google Cloud Platform. The integration is similar in shape to Microsoft’s but smaller in absolute installed base. For enterprises on Workspace, Gemini Enterprise integration is similarly low-friction; for enterprises not on Workspace, the distribution does not amplify the procurement decision.

OpenAI: SaaS distribution plus ChatGPT Enterprise. OpenAI’s enterprise reach runs through ChatGPT Enterprise (per-seat licensing) and through API integration embedded in third-party SaaS products. Anthropic’s enterprise penetration sits at 44% as of late 2025, up 25 percentage points since May 2025, with 80% of revenue from enterprise customers (The New Stack, harness pricing analysis); OpenAI maintains comparable enterprise positioning through different distribution. Neither has Microsoft’s daily-driver-productivity-surface advantage, and neither tries to compete on that axis.

Anthropic: direct enterprise plus three-cloud distribution. Anthropic’s enterprise distribution is direct contract plus three-cloud BAA-backed deployment. The distribution shape is platform-neutral by design, which produces the regulated-industry advantage at the cost of not having Microsoft’s daily-driver-surface footprint.

The procurement implication: Microsoft and Google compete on platform stickiness; OpenAI and Anthropic compete on capability flexibility and pricing. An enterprise heavily standardised on Microsoft 365 has a procurement bias toward Microsoft Copilot agents that is rational. An enterprise on Google Workspace has the parallel bias toward Google. An enterprise without strong existing ecosystem standardisation has a more open evaluation, and the platform-neutral vendors (Anthropic and OpenAI) compete more directly with each other in that environment.

What the model layer actually decides

The model layer in 2026 decides one thing reliably: tie-breakers between two near-equivalent platform-and-governance choices. Most enterprise procurement teams over-weight this axis early in the evaluation, when the real procurement decisions are happening at the pricing, governance, and ecosystem layers.

Concrete pattern of where model capability still matters: an enterprise with two finalists after the regulatory rule-out, one platform-integrated and one platform-neutral, similar pricing models, similar governance posture, similar ecosystem fit. The model bake-off then breaks the tie. The bake-off should focus on the specific use case the deployment will actually serve, not generic benchmarks. Carnegie Mellon’s TheAgentCompany (claim ACA-2026-004) shows the 30% task-completion ceiling is not vendor-specific; the variance is in the long tail of edge cases where one model handles a specific scenario better than another, and that variance is highly use-case-dependent.

Concrete pattern of where it does not matter: most procurement decisions before the regulatory rule-out, before the ecosystem-fit classification, and before the GAUGE governance scoring on finalists. Treating the model bake-off as the primary axis at the start of the evaluation is the structural mistake. The bake-off, if it happens at all, is the last step, not the first.

Mapping the comparison to existing frameworks

The vendor selection plugs into the four governance frameworks the publication has covered:

Build vs buy vs partner (/build-vs-buy-vs-partner-for-enterprise-agentic-ai-2026/, claim AM-028) sits one layer above. Most vendor evaluations in this batch are buy decisions, but partnership engagement may be the right answer for use cases requiring proprietary data and sustained vendor co-development. Anthropic, OpenAI, Google, and Microsoft each have services arms or partner ecosystems that support partnership engagement; the engagement shape is part of the vendor decision, not separate from it.

The 60-question agentic AI RFP (/the-enterprise-agentic-ai-rfp-60-questions/, claim AM-026) is the procurement instrument for the buy path. The 60 questions cover identity, action-approval, data flows, audit, exit, and accountability. Used against finalist vendors after the regulatory rule-out and ecosystem classification, the RFP surfaces the operational gaps that vendor marketing materials do not.

GAUGE governance scoring (/gauge/) scores each finalist on six dimensions: governance maturity, threat model, ROI evidence, change management, vendor lock-in, and compliance posture. The scoring is per-vendor, with the surface owners present (procurement, security, legal, architecture, the team that will run the deployment). Disagreements across functions are signal.

The CFO business case (/the-cfos-agentic-ai-business-case-tco-and-roi/, claim AM-027) is where the pricing comparison lives in operational form. The five hidden cost categories (per-action token, observability, oversight, vendor lock-in, change management) need to be modelled per vendor. The pricing-model split discussed above produces different hidden-cost profiles per vendor, and the CFO model captures the differences explicitly.

The four frameworks together produce a defensible vendor decision; using any one alone produces a partial decision the other three would later reveal as inadequate.

What to do Monday

The realistic enterprise vendor evaluation track for an enterprise that has not yet started:

Week 1. Map the regulatory environment. Cross-reference each vendor’s BAA and compliance posture. Rule out vendors that do not match the regulatory profile. Most enterprises eliminate one of the four at this step.

Week 2. Classify by platform integration. Choose between platform-integrated (Microsoft, Google) and platform-neutral (Anthropic, OpenAI) based on existing ecosystem standardisation. The choice is determined by what the enterprise already runs, not by anything the vendor controls.

Week 3. Score remaining candidates on GAUGE for a representative deployment. Surface the disagreements between procurement, security, and architecture. Capture the deltas; they are the procurement work.

Week 4. Run the 60-question RFP against the finalists. Use the questions to provoke live conversations with vendor engineering. Vendors who answer all 60 in a weekend with thoughtful prose are usually a procurement red flag.

Week 5. Apply the build-vs-buy-vs-partner framework. Most decisions are buy; some are partner; build is rarely the right answer for general enterprise agentic AI in 2026.

Week 6. Decide and document. The vendor decision plus contract terms plus GAUGE scores plus RFP responses constitute the audit-defensible record. Set quarterly GAUGE review of the vendor relationship.

The 6-week timeline is tight but achievable. Enterprises running this in parallel with the broader EU AI Act preparation track (/eu-ai-act-agentic-ai-compliance/, claim AM-035) typically have one vendor selected and onboarded by mid-June 2026, leaving six weeks before the August 2026 enforcement window for the deployment-discipline work the vendor decision unlocks.

The Holding-up note

The primary claim of this piece (that 2026 enterprise agentic AI vendor procurement is no longer primarily a model bake-off and that pricing model, governance posture, and ecosystem distribution are the actual decision axes) is logged at AM-039 on the Holding-up ledger on a 60-day review cadence. Three kinds of evidence would move the verdict:

Major repricing or model-tier changes at any of the four vendors. The pricing-model split in this analysis assumes Anthropic Managed Agents at 8 cents per session-hour, OpenAI Agents SDK at no runtime fee, and vertically-integrated bundle pricing at Microsoft and Google. Any of the four can change pricing in ways that materially affect the procurement comparison.
Regulatory enforcement actions that materially affect one vendor’s enterprise-suitability profile. The August 2026 EU AI Act enforcement window opens within the next review cycle. Early enforcement actions could differentially affect vendors depending on which deployment patterns the enforcement focuses on.
Entry of a credible fifth platform. Most plausibly via the Linux Foundation Agentic AI Foundation member firms or via a major systems-integrator-backed neutral platform. None visible as of late April 2026, but the platform-completeness bar moves over time.

The next review of this claim is scheduled 25 June 2026. The August 2026 EU AI Act enforcement window opens within five weeks of the next review.

ShareX / Twitter LinkedIn Email

Spotted an error? See corrections policy →

Disagree with this piece?

Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.

Part of the pillar

AI agent procurement →

The contracts, SLAs, and evaluation criteria that distinguish agentic-AI procurement from SaaS procurement. 2 other pieces in this pillar.

The pricing model split

The BAA and regulatory governance distinction

The ecosystem distribution dimension

What the model layer actually decides

Mapping the comparison to existing frameworks

What to do Monday

The Holding-up note

The 60-question agentic AI RFP, built as a procurement tool.

AI agent procurement →

Related reading

AI assistant vs AI agent: the procurement distinction

The enterprise agentic AI RFP: 60 vendor questions

The McKinsey 17% EBIT claim: what the survey actually measured

AI-written analysis, signed by a practitioner. One or two pieces a week.