Skip to content
Topic pillar · 38 tracked pieces

Topic · AI agent procurement

The contracts, SLAs, and evaluation criteria that distinguish agentic-AI procurement from SaaS procurement.

RFPs, SLAs, contract clauses, and the vendor-evaluation rubrics that survive procurement review.

Agent procurement is procurement with three new variables most contracts don't yet handle: non-deterministic outputs, long-running autonomous workloads, and the question of what counts as a vendor-side defect when the system that fails is a stochastic model.

Standard SaaS procurement clauses don't cover agents. Standard MSAs assume deterministic services delivered to spec. Standard SLAs measure uptime, not output validity. The procurement teams that ship agentic AI without burning a year on contract negotiation are the ones using rubrics built for the new failure modes — and there are very few of those rubrics in public.

This pillar publishes AI agent SLA templates with measurable thresholds for output validity, time-to-detect failure, and time-to-recovery, with examples from named deployments that work. MSA and DPA clauses specific to agentic AI — non-determinism allowances, reproducibility carve-outs, training-data segregation, agent-credential rotation policies. RFP question libraries with 60+ procurement questions mapped to the GAUGE rubric and evidence prompts per question.

Vendor-evaluation rubrics — Anthropic versus OpenAI versus Google versus Microsoft for enterprise agent workloads, with the comparator dated and the methodology declared. Build-versus-buy-versus-partner analysis — the three-way decision most enterprises now face on every agent workload, with named-company case studies of each path.

Pieces here cite real procurement contracts where parties have given permission, anonymised case studies where they haven't, and named vendor documentation in every comparison.

Pillar last refreshed 2026-05-01

What survives review

What has broken

Spoke articles

  • ISO 42001 is becoming the enterprise AI procurement checkpoint

    ISO/IEC 42001 is the first certifiable AI management system standard, and through 2025-2026 it has started appearing in regulated-sector and EU AI vendor RFPs as a stated or preferred requirement. The procurement question is no longer whether to ask about it, but how to ask: a certificate on its own proves little, and the buying-committee discipline is to require evidence of the operating management system behind it.

  • Vendor strategic-narrative proof points: the agentic AI procurement diligence checklist

    Every agentic AI vendor pitches a strategic narrative; few are tested against the proof points that distinguish 'this is the future' rhetoric from 'this is what we built and what it does'. The 2026 buying-committee diligence checklist walks seven proof points (named-customer references plus revenue contribution, model-vendor relationships disclosed in the MSA, the engineering team's tenure and turnover rate, the post-revenue-recognition product-roadmap evidence, the regulatory disclosure cadence, the executive incentive structure, and the public technical-content cadence) and produces the structural read on whether the narrative is the product or the cover.

  • Salesforce platform AI vs Microsoft platform AI: the 2026 full-stack comparison for the buying committee

    The product-level comparison of Agentforce against Microsoft Copilot is the conversation the existing /compare/ page already covers. The buying-committee question one tier up is the platform comparison; the Salesforce stack (Einstein + Agentforce + Data Cloud + MuleSoft + Tableau) against the Microsoft stack (Copilot + Azure AI Foundry + Microsoft 365 + Fabric + Power Platform). The two stacks compete on different axes and answer different buying-committee questions; the procurement that treats them as substitutes is the procurement that mis-prices the migration cost in year two.

  • Enterprise AI infrastructure vendors: the 2026 SLA and uptime comparison matrix

    The agentic AI architecture piece on SLA design is the customer-side specification; the SLAs the major infrastructure vendors actually post are the supply-side reality. The 2026 buying-committee SLA comparison resolves on five dimensions (uptime commitment, latency commitment, support response tier, credit calculation, and exclusions list) and reveals the structural gap most agentic AI buying committees discover at year-two renewal: the headline 99.9% uptime is calculated against a denominator and an exclusions list that materially shifts the customer's effective availability.

  • Digital transformation RFP: the AI UX assessment question set the existing 60-question playbook does not cover

    The 60-question agentic AI RFP playbook covers governance, technical depth, procurement, and audit. The UX assessment is the dimension the existing playbook treats only at the workflow-design level; the digital-transformation RFP that includes agentic AI surfaces the user-interaction question more directly because the agent is the new UI primitive in the customer's environment. The 15 UX-assessment questions below extend the existing playbook into the design and interaction surface that the 2026 procurement evaluates the vendor against.

  • AWS vs Microsoft vs Google vs OpenAI vs Anthropic: the enterprise agentic AI framework matrix for 2026

    The buying-committee comparison of AWS Bedrock AgentCore, Microsoft Azure AI Foundry + Copilot Studio, Google Vertex AI Agent Builder, OpenAI Assistants + Agent Builder + Swarm, and Anthropic Claude Agent SDK is not the comparison the existing /compare/ pairs cover. The five-vendor framework matrix prices the choice as an orchestration-layer commitment rather than a model-tier commitment, with five comparison axes (orchestration primitive, tool-use protocol, deployment topology, observability tier, and exit cost) that resolve differently from the pairwise comparisons the publication already runs.

  • The agent protocol tax: MCP, A2A, and Llama Stack are not converging. Your tool inventory is the locked asset

    Anthropic's Model Context Protocol reached broad client and server adoption through 2025. Google's Agent2Agent protocol moved to the Linux Foundation later the same year. Meta's Llama Stack consolidated its agent-runtime spec on a separate track. Microsoft's Copilot Agent platform and Salesforce's Agentforce maintain proprietary surfaces. The three open protocols are not converging on a single standard, and the four major proprietary surfaces are not adopting any of them as default. The cost of being wrong on the model choice is low. The cost of being wrong on the protocol choice is high, because the locked asset is not the agent code, it is the tool inventory the agents call.

  • What SAP's 50 Joule agents at Sapphire 2026 mean for CIOs making ERP renewal decisions

    SAP's Sapphire 2026 keynote introduced the Autonomous Enterprise vision: 50-plus domain-specific Joule AI Assistants embedded across finance, supply chain, procurement, HR, and CX, orchestrating more than 200 specialised agents. Anthropic's Claude powers the finance, procurement, and supply chain Joule agents. RISE with SAP customers receive a contractual commitment to activate three Joule Assistants in year one. SAP GROW customers get 20-plus from day one. The ERP renewal calculus has changed. The AI agent layer is no longer an add-on evaluation; it is inside the contract.

  • Public-sector agentic AI procurement: what the GSA and EU records show

    Federal and EU member-state agentic AI contract records show renewals running materially below the enterprise SaaS benchmark. The driver is not technical performance but audit-evidence completeness under OMB M-24-10 §5 and EU AI Act Article 12. The procurement implication is structural.

  • The split verdict: GPT-5.5 vs Claude Opus 4.7 and why CIOs need two models, not one

    Anthropic shipped Claude Opus 4.7 on 16 Apr 2026; OpenAI shipped GPT-5.5 seven days later. Both vendors claim leadership. Neither model wins everything. The procurement question for 2026 is not which one to standardise on, because the evaluation evidence does not support a single-model answer for any enterprise running both agentic-coding workloads and knowledge-work workloads. The two-year procurement decision is whether to plan the routing or accept the tax of pretending it does not exist.

  • Agentic code auditing: what the Firefox Claude Mythos disclosure tells procurement about CI-time defaults

    Mozilla's Firefox 150 release (November 2025) shipped fixes for 271 vulnerabilities surfaced by the Claude Mythos Preview pipeline. The headline fact ('AI found 271 bugs') is true but is not the procurement-relevant one. The procurement-relevant change is that the agentic-verification step (the agent builds and runs its own test cases to triage suspected bugs before reporting) cleared the false-positive wall that blocked earlier read-only GPT-4 / Claude Sonnet 3.5 attempts from production CI. CI-time agentic auditing becomes the default expectation for any shipping enterprise software in 2026, with three derived procurement-deck questions and one dual-use risk surfacing alongside the defensive disclosure.

  • Agentic AI accuracy claims: the three questions every CIO should ask before 'ready-to-run' becomes a procurement decision

    Anthropic posted a launch this week positioning the product as 'ready-to-run'. The phrase is procurement-deck noise unless three questions are answered: accuracy rate on which task, against which baseline, measured by what methodology. The 2026 industry baseline for procurement-credible accuracy disclosure is the academic-benchmark pattern (CRMArena-Pro 35% multi-step reliability on a defined CRM task corpus; CMU TheAgentCompany 30-35% reproduction range; WebArena ~36% browser-agent ceiling) and the vendor-disclosure pattern Anthropic itself established earlier (Claude for Chrome 23.6% → 11.2% → 0% with named attack corpus and patch cadence). Vendor 'ready-to-run' positioning that doesn't meet either bar leaves the deploying enterprise inheriting the methodology gap as an audit-defense burden.

  • What is Agent Mode? Microsoft, Cursor, GitHub Copilot, and OpenAI in 2026

    Agent Mode is the same brand-name shipping in three different product classes in 2026: Microsoft 365 Copilot productivity-suite agents, Cursor IDE agents, and GitHub Copilot code-platform agents. Procurement teams comparing them feature-by-feature are comparing categories that aren't substitutes.

  • Agentic AI discovery: what the phase upstream of procurement actually has to test

    McKinsey reports a $2.7 trillion paradox: 80% of companies use generative AI but report no bottom-line impact. Gartner projects 40% of agentic AI projects will be cancelled by end of 2027. Gartner's January 2025 poll of 3,412 executives (19% significant investment, 42% conservative, 31% wait-and-see, 8% none) describes the phase distribution. The discovery phase upstream of procurement is not a vendor-evaluation sprint; it is an organisational-readiness test. Four upstream tests determine whether the deploying enterprise should proceed at all, and the right answer for a meaningful share of organisations remains 'not yet'.

  • Microsoft 365 Copilot Agent Mode for enterprise: 2026 procurement read
  • Claude for Chrome: what Anthropic's 23.6% to 11.2% prompt-injection numbers tell procurement

    Anthropic shipped Claude for Chrome on 26 Aug 2025 to 1,000 Max-plan subscribers at $100-200 per month, alongside a published security disclosure: 23.6% prompt-injection success rate pre-mitigation, 11.2% post-mitigation, 0% on URL-injection variants after subsequent patches. The rates describe the structural exposure level the deploying enterprise inherits at the browser-resident agent class, not at Anthropic specifically. The procurement-relevant signal is the published-disclosure posture itself, which places Anthropic in Cohort A under the vendor-response-split framework and gives procurement a verifiable baseline that competitors will be measured against as they ship parallel products.

  • AI vendor exit clauses: the 2026 procurement red-flag checklist

    Switching AI vendors in 2026 is a contracts problem before it is a tech problem. Seven exit-clause patterns most enterprise MSAs miss, and how to redline each before signature.

  • AI infrastructure water consumption: what the Google 8.1B disclosure and EU 2023/1791 tell procurement

    Google reported 8.1 billion gallons of data-centre water consumption in 2024 (33% year-over-year from 6.1B in 2023). Microsoft reported 6.4 million cubic metres in 2022 at a Water Usage Effectiveness of 0.30 L/kWh, a 39% improvement from 0.49 the prior year. The EU Energy Efficiency Directive 2023/1791 made WUE and water-consumption reporting mandatory for data centres above 500 kilowatts of IT power demand starting 15 September 2024. AI infrastructure water consumption is no longer a sustainability footnote; it is a procurement-deck variable codified in regulation, with vendor disclosure postures already differentiating Cohort A and Cohort B in the same shape the security-disclosure analysis (AM-007) frames.

  • AI Bill of Materials (AI BOM): what enterprise should disclose and track

    An AI Bill of Materials in 2026 is the audit-ready inventory of every model, dataset, evaluation, and deployment dependency in a production AI system. Most enterprises do not yet ship one. EU AI Act Article 16 deployer-documentation obligations make it mandatory in scope by 2 August 2026.

  • AI assistant vs AI agent: when the distinction is procurement-relevant

    OpenAI's own agents documentation defines an agent as a system that uses 'multicomponent autonomy to independently reason, decide and problem-solve by using external data sets and tools'. The definition distinguishes agents structurally from the reactive, request-driven AI assistants whose deployment patterns are documented at named-customer scale. McKinsey's Lilli platform reaches 72% employee adoption and processes 500,000+ prompts monthly with roughly 30% time savings on knowledge work. Gartner projects 40%+ of agentic AI projects will be cancelled by end of 2027. Assistants and agents are different procurement decisions, not points on a continuum, and the procurement-deck reading turns on whether the deploying enterprise is buying a reactive request-driven system whose ROI is well-documented or an autonomous-action system whose deployment patterns are still emerging.

  • AI agent vs AI assistant vs LLM: the 2026 enterprise distinction

    AI agent, AI assistant, and LLM are three structurally different categories in 2026. Procurement that conflates them buys the wrong governance shape, the wrong cost structure, and the wrong identity model.

  • The agentic AI pilot-to-production gap: what vendor 'successful pilot' references do not tell procurement

    Vendor 'successful pilot' references are the most common evidence presented to enterprise procurement committees evaluating agentic AI. McKinsey State of AI 2025 (Nov 2025, n=1,491) reports 23% of enterprises scaling and 39% still experimenting; the documented 2024-2025 walk-backs (Klarna 700-agent reversal, Salesforce Agentforce 200-customer reality, GitHub Copilot April 2026 token-counting bug) describe what those references typically obscure. The gap between vendor-reference pilot success and procuring-enterprise scaled production is operational, and it is the procurement committee's job to make the regime-translation question explicit before the contract closes.

  • Vendor MSA renewal in the post-EU-AI-Act-enforcement window: what changes in the AI MSA red-team checklist after 2 August 2026

    The 38-item AI MSA red-team checklist (RES-005) covered the seven clause families where 2025-2026 enterprise AI MSAs cluster their failure modes. The 2 August 2026 EU AI Act deployer-obligations enforcement window adds three new procurement-defensible asks that were not load-bearing in pre-enforcement contracts: Article 11 technical-file pass-through, Article 16 post-market-monitoring support, and Article 26 deployer-documentation supply. Plus the asymmetric-instrument observation that procurement teams across enterprise and operator scales face the same vendor-citation-chain manipulation pattern with different audit instruments — a 600-word insert that lives at the intersection of this piece's procurement frame.

  • How vendor case studies travel between enterprise and operator AI buyers — and what each cohort gets wrong from the other's evidence

    Enterprise AI buyers and operator AI buyers consume vendor case studies aimed at the other cohort and produce mirror-image misreads. The Fortune-500-bank case lands in operator decks as 'this works at SMB scale too' (it usually does not, in the way the case study describes). The IndieHacker testimonial lands in enterprise decks as 'even small teams ship it' (the small team's operational substrate is structurally different from the enterprise's). The mechanism is the same — vendor citation chains travel cohort-to-cohort with applicability mismatches the readers do not catch — and the procurement cost is paid in both registers. This is the bridge piece between AM-* and OPS-* registers that the four expert reviewers said earned its slot.

  • Foundation-model uptime in 2026: the 24-month outage record across Anthropic, OpenAI, Google, AWS Bedrock, and Azure OpenAI

    Foundation-model providers publish status pages that report on the model API as if it were one service. The 24-month operational record across Anthropic, OpenAI, Google, AWS Bedrock, and Azure OpenAI does not support that framing. The procurement-defensible posture in 2026 is multi-provider routing with documented failover, and the SLA gap between what vendors publish and what enterprise contracts actually need is now wide enough to be the primary procurement signal in foundation-model selection.

  • Agent evaluation in production: eval-set design, drift detection, and regression budgets for the deployed agent

    The four 2026 agent-evaluation platforms (DeepEval, Braintrust, LangSmith, Patronus) covered at AM-122 are the procurement decision. The evaluation discipline that decides whether the chosen platform produces useful signal is the eval-set design, the drift-detection cadence, and the regression-budget framework — the three operational disciplines most enterprises buy a platform for and then under-invest in. This piece walks the in-production cut that sits between the eval-tooling decision and the MTTD-for-Agents observability framework.

  • Agentic AI 2024-2025 retrospective: what actually shipped, what walked back, and what 2026 procurement should learn from each

    Read against audited primary sources rather than vendor decks, agentic AI 2024-2025 produced four classes of evidence the 2026 procurement reader should distinguish: vendor-published wins inside vendor-controlled environments, audited customer pilots with active human oversight, the public walk-backs (Klarna, GitHub Copilot rate-limit, EchoLeak), and the structural failure modes (multi-step reliability, prompt-injection class). Each class produces a different procurement lesson; treating them as one 'AI is working' narrative is the most common 2026 enterprise mistake.

  • Agent observability in 2026: Langfuse, Arize, Helicone, and LangSmith — and the procurement decision that is not the eval decision

    Evaluation tells you whether the agent is right. Observability tells you what the agent did. Production deployments need both, the procurement decisions are different, and conflating them produces SLA architecture that fails its first incident. The four credible 2026 observability platforms (Langfuse, Arize, Helicone, LangSmith) split cleanly on one structural axis: open-source-first vs SaaS-first. Helicone has just gone into maintenance mode.

  • Agent evaluation frameworks in 2026: DeepEval, Braintrust, LangSmith, and Patronus map to four deployment shapes

    The four credible agent-evaluation platforms in 2026 don't compete on capability rank. They fit four distinct deployment shapes. DeepEval is the open-source pytest-native option. Braintrust is the SaaS eval primitive. LangSmith is the LangChain-stack observability and eval bundle. Patronus has pivoted from hallucination specialist to digital-world-model frontier lab. Picking on a generic feature matrix produces the wrong answer for most enterprises.

  • Reinsurance and the catastrophic AI tail: why your cyber renewal is tightening

    Primary cyber-insurance carriers are not the source of 2026 cyber-renewal tightening; the reinsurance market behind them is. Lloyd's of London, Munich Re, and Swiss Re have been recalibrating their assumptions about cascading agent-failure scenarios, and the rate signal travels downstream to the policy your General Counsel is renewing this quarter.

  • AI Bill of Materials in 2026: when AI-BOM becomes a procurement requirement

    AI-BOM is moving from optional security artefact to enforceable procurement requirement, driven by EU AI Act Article 11 documentation and the CycloneDX ML-BOM specification. Enterprises tracking SBOM compliance are blindsided when AI procurement requires a different inventory shape.

  • Agentic-AI vendor contracts: the six gotchas in 2026 enterprise MSAs that procurement teams routinely miss

    2026 agentic-AI MSAs hide six contract patterns that transfer risk from vendor to enterprise. CIOs signing without redlines on all six are absorbing exposure their boards have not approved.

  • Anthropic vs OpenAI vs Google vs Microsoft for enterprise agents in 2026

    The four credible enterprise agentic AI platform plays in 2026 are Anthropic, OpenAI, Google, and Microsoft. The procurement decision between them is no longer primarily about model capability. It is about pricing model, governance and BAA posture, and ecosystem distribution. Treating it as a model-quality bake-off is the most common 2026 procurement mistake.

  • The 2026 Enterprise Agentic AI Procurement Playbook

    A six-stage procurement track integrating build-vs-buy-vs-partner, the 60-question RFP, GAUGE governance scoring, four-vendor comparison, and EU AI Act compliance into one operational sequence. Ships in 8 to 10 weeks for standard enterprise environments. Produces an audit-defensible procurement artifact that satisfies EU AI Act Article 9 by construction.

  • AI agent ROI calculator: the 2026 enterprise framework

    Eight-input ROI calculation framework for enterprise AI agent deployments. Covers what standard SaaS calculators miss: per-session-hour cost, HITL labour, instrumentation, compliance, productivity uplift, avoided incidents, revenue net of regression risk, strategic-option value.

  • AI agent contract exit clauses: 8 provisions for 2026

    Eight contract exit-clause provisions that standard SaaS templates do not cover but enterprise agentic AI procurement requires: audit-log export, trained-state extraction, prompt portability, connector reconfiguration, named handoff, regulatory-evidence preservation, data-residency continuity, liability-tail.

  • AI assistant vs AI agent: the procurement distinction

    AI assistants and AI agents are not the same product class. One suggests; the other acts. The procurement, governance, audit, and TCO models differ categorically. Conflating them is the most common 2026 enterprise procurement mistake.

  • The enterprise agentic AI RFP: 60 vendor questions

    Generic SaaS RFPs miss six dimensions that decide whether an agentic deployment survives 18 months. Here's the GAUGE-mapped 60-question version.

What we're watching next

  • Major frontier-vendor agent-class SLAs going public with action-correctness commitments.As of Q2 2026, no vendor publishes a contractual SLA on output validity for agent-mode workloads. The pillar's contract-gotchas piece predicts this gap closes through 2026; the first vendor to ship one resets the procurement bar.
  • Lloyd's of London or Munich Re publishing a stand-alone agent E&O wording.Existing cyber and tech-E&O policies cover agent risk under stretched interpretations of legacy policy language. A purpose-built agent E&O wording — when it lands — changes the insurance-and-underwriting cluster's verdicts and the procurement contract's risk-allocation clauses.
  • DORA enforcement actions against critical-third-party AI providers.DORA's critical-ICT-provider designation is now active in the EU. The first AI-specific enforcement action will surface what the 'audit rights' and 'sub-outsourcing disclosure' clauses actually require in practice.
  • Industry-standard agent procurement RFP language emerging from analyst firms or industry consortia.The 60-question RFP this pillar publishes is a working draft. Convergence with Gartner, Forrester, or industry-consortium drafts would consolidate procurement vocabulary; divergence would force a clearer position on what the gaps are.

Primary sources we trust for this topic

A curated list of primary research, regulator guidance, and vendor documentation for ai agent procurement. Populated on the quarterly refresh — not a link dump, not competitors.


This pillar page is refreshed quarterly. Last refresh: 19 Apr 2026. Next refresh: 18 Jul 2026.

Vigil · 44 reviewed