Skip to content
This piece was written by Claude (Anthropic). Peter set the brief, reviewed the sources, and signed off on publication before it went out. Why we work this way →
AM-026pub24 Apr 2026rev24 Apr 2026read11 min
AI Implementation

The enterprise agentic AI RFP: 60 vendor questions

Generic SaaS RFPs miss six dimensions that decide whether an agentic deployment survives 18 months. Here's the GAUGE-mapped 60-question version.

Holding·reviewed24 Apr 2026·next+59d
RFP cover. A procurement-form grid with 6 numbered rows, one per GAUGE dimension, each carrying a 10-question subtotal. Right column shows a scoring band 0–100. Footer: 60 questions. 6 dimensions. 1 weighted score.
RFP cover. A procurement-form grid with 6 numbered rows, one per GAUGE dimension, each carrying a 10-question subtotal. Right column shows a scoring band 0–100. Footer: 60 questions. 6 dimensions. 1 weighted score.

A generic SaaS RFP does not hold up for agentic AI procurement in 2026. Six dimensions don’t get asked — and it’s those six that determine whether the deployment survives its first 18 months. The usual questions cover uptime, seat pricing, SSO, SOC 2, data residency. Fine, necessary, universal. None of them tell a procurement committee whether the vendor has thought about cross-agent delegation abuse, whether the vendor’s own threat model covers EchoLeak-class zero-click exploits, whether the vendor’s “171% ROI” customer case studies have documented baselines, or whether the vendor can be exited without rebuilding six months of workflow.

This piece is the 60-question version of that RFP. Ten questions per dimension, mapped to the six dimensions of the GAUGE framework. Each question demands evidence, not a sentence. Vendors who refuse to engage, or respond with marketing copy, are telling the procurement committee something specific about the vendor.

Two propositions structure the approach:

  • An RFP is a disqualifier, not a scoring exercise. Its purpose is to identify vendors who cannot survive 18 months of enterprise governance pressure. The 12% of agentic AI deployments that clear 300%+ ROI per Stanford Digital Economy Lab’s 2026 data are almost entirely the ones whose vendor relationships survived scrutiny on all six dimensions, not just the first three.
  • Vendor response quality is itself a signal. How specifically a vendor answers these questions correlates with whether they’ll survive scaling. Vague answers on governance, threat-model, or vendor lock-in predict the failures that show up 12-18 months later — well-documented in Gartner’s projection of 40%+ agentic AI projects cancelled by end-2027.

The downloadable Excel contains the full 60 questions with scoring columns, a GAUGE-aligned weighted total, a red-flag checklist, and a multi-vendor comparison sheet. This article walks the framework and shows representative questions per dimension.

Why the generic RFP fails

Typical enterprise SaaS RFPs ask 40–80 questions across categories like: product capabilities, integrations, security (SOC 2, ISO 27001, SAML SSO), data handling (GDPR DPA, data residency), pricing and contract, support and SLA, roadmap. That framework is mature and usually adequate. For agentic AI, three gaps appear:

The “product capabilities” section asks what the product does, not how it behaves. For an agentic system, behaviour at scale is the product. A vendor can honestly list “summarisation, classification, tool use” as capabilities and still have no answer for “what does the agent do when presented with an adversarial document.” Generic RFPs don’t surface behavioural questions.

The “security” section doesn’t cover the agent-specific attack surface. SOC 2 and ISO 27001 are table stakes; they do not evaluate prompt-injection resistance, cross-agent delegation controls, or the incident-response SLAs aligned to MTTD-for-Agents. A vendor can be SOC 2 Type II certified and have no documented threat model for agentic attack classes.

The “contract and pricing” section underweights exit conditions. Enterprise SaaS contracts typically cover breach-of-contract exits but not commercial-degradation exits (rate increases, acquisition, service-quality drift). Agentic AI vendor economics are still in flux — aggressive consolidation is ongoing, pricing models shift annually — and exit provisions written to a 2022 template are materially under-protective.

The 60-question RFP that follows is not a replacement for the generic SaaS RFP. It’s the additional layer that evaluates the six agent-specific governance dimensions.

The six dimensions, 10 questions each

The full RFP is in the downloadable Excel. Below, representative questions per dimension with notes on what good answers look like.

Governance maturity — 10 questions

Covering registry transparency, approval workflows, deprecation policies, and customer control over platform defaults. Representative questions:

  • Provide a complete list of model versions, training data sources, and fine-tuning methods used by your platform, with change history for the last 24 months. Strong answer: a versioned document, publicly available or under NDA, with dated entries. Weak answer: “We use industry-standard models.”
  • What deprecation policy applies to retired models and agents — notification period, migration path, data retention? Strong answer: specific intervals (≥90 days notice, ≥12 months data retention), documented migration tooling, named customer examples. Weak answer: case-by-case basis.
  • Can an enterprise customer impose their own approval workflow on top of your platform’s defaults? Strong answer: yes, with configuration documentation and named examples of customers who have. Weak answer: “contact your account manager.”

Threat model — 10 questions

Covering prompt injection, cross-agent delegation, data exfiltration, data poisoning, disclosure SLAs, and customer-visible refusal monitoring. The NIST AI Risk Management Framework and its Generative AI Profile (AI 600-1) are the reference frames; the RFP questions operationalise them into vendor-specific evidence requests. Representative questions:

  • Describe your defences against data exfiltration through tool calls — specifically against EchoLeak-class zero-click prompt-injection exploits. Strong answer: named mitigations, test results, mention of the specific Q1 2026 exploit classes. Weak answer: “our security team monitors for anomalies.”
  • Show evidence of tabletop exercises run against your agent platform within the last 12 months, with scope + findings summary. Strong answer: two or more dated exercises, scope covering agent-specific attacks, remediation status per finding. Weak answer: referring to generic red-team engagements.
  • Do you publish a vulnerability disclosure policy with specific SLAs for fix timelines? Strong answer: public policy page with SLAs tiered by severity (e.g., critical < 7 days, high < 30). Weak answer: CVD via security@vendor.com with no timelines.

ROI evidence — 10 questions

Covering case-study integrity, baseline measurement, independent validation, customer measurement autonomy, and retracted-claim transparency. Representative questions:

  • Provide three named enterprise customer case studies with documented ROI, baseline measurement methods used, and independent validation of results. Strong answer: three case studies with primary-source links (customer executive attribution, methodology paragraph, third-party auditor named). Weak answer: one case study written by the vendor’s marketing team.
  • Can an enterprise customer run their own measurement layer independent of your telemetry, and what hooks do you provide? Strong answer: documented APIs, customer-owned observability SDK, example implementations. Weak answer: “our dashboards show everything you need.”
  • Have any published customer ROI claims ever been formally retracted or revised? If so, document which, when, and why. Strong answer: yes, with the revision log publicly visible. Weak answer: “we have never needed to retract a customer claim” (which, in a market honest about the 171% ROI narrative problem, is a yellow flag on its own).

Change management — 10 questions

Covering rollout playbooks, training materials, adoption-metric transparency, scope-change controls, rollback, and configuration freeze for compliance. Representative questions:

  • What training materials do you provide for end users affected by your platform, and how are they delivered across user cohorts? Strong answer: role-segmented materials, delivery in customer’s LMS, completion tracking handed to the customer. Weak answer: generic documentation on a help site.
  • Can an enterprise customer freeze an agent’s configuration for compliance purposes — no updates from you until explicitly approved? Strong answer: yes, documented as “configuration freeze” or similar, with named customer examples (financial services, regulated healthcare). Weak answer: no.
  • What version control and rollback do you support at the agent configuration layer? Strong answer: immutable version history, one-click rollback, audit trail. Weak answer: “contact support.”

Vendor lock-in — 10 questions

Covering data export completeness, cross-platform portability, measured switching cost, contract exit triggers, EOL notice, and rate-change notice. Representative questions:

  • Document a complete customer data export path — volume, format, completeness guarantees, time to completion for a typical enterprise customer. Strong answer: documented format (JSONL, CSV, Parquet), typical export time with specific numbers, coverage guarantees. Weak answer: “we provide data export on request.”
  • Has a paying customer successfully migrated off your platform in the last 24 months? Provide at least one reference. Strong answer: yes, with a named reference (even if the customer doesn’t want to be publicly disclosed). Weak answer: “no customer has ever left us” — which is itself a red flag for vendors older than 3 years.
  • What are your contract exit provisions beyond catastrophic-failure triggers — e.g., rate-change triggers, service-degradation triggers, acquisition triggers? Strong answer: named triggers with threshold definitions (e.g., “rate increase > 25% in any 12-month period triggers customer right-to-exit without penalty”). Weak answer: “standard contract exit terms apply.”

Compliance posture — 10 questions

Covering certifications, EU AI Act alignment, GDPR posture, NIS2/sector-specific compliance, audit-trail retention, NIST AI RMF implementation, indemnification, and transparency reporting. The relevant primary sources — ISO/IEC 42001:2023, EU AI Act, GDPR Article 33 (breach notification), NIS2 Article 23 (early warning and incident reporting) — all have specific requirements that the RFP questions operationalise. Representative questions:

  • How do you comply with EU AI Act requirements for high-risk AI systems (Annex III), and which of your platform components fall into those categories? Strong answer: a documented compliance map per component, with risk-tier classification and Annex III article references. Weak answer: “we are working toward EU AI Act compliance.”
  • What audit-trail retention policy applies to agent actions — format, duration, customer export? Strong answer: named format (e.g., immutable JSONL, write-once storage), specific duration (≥7 years for regulated sectors), customer-exportable on demand. Weak answer: “audit trails available on request.”
  • Do you publish an annual transparency report (government data requests, takedowns, incident disclosures)? Strong answer: yes, with the last 3 years of reports publicly linked. Weak answer: no (which does not disqualify but is a differentiator).

Scoring — converting vendor responses to a GAUGE-aligned number

Each of the 60 questions is scored 0, 3, or 5 per the answer evidence:

  • 0 — absent, vague, or marketing copy
  • 3 — partial evidence, documented but incomplete
  • 5 — strong evidence, named customers, measurable commitments

Dimension subtotals (10 questions × 5 max) map to the dimension’s GAUGE weight. The Excel handles the weighted-sum computation — the output is a GAUGE-style 0–100 score that is directly comparable across vendors. The full methodology explains the weighting; the RFP Excel applies it.

Scoring honestly, expect most vendors to land 35–55 on a first-pass RFP. Above 70 is rare for the current generation of agentic AI vendors. That is information, not a disqualifier — scoring 45 may still be the best vendor available in some categories, and knowing the gaps before contract signature lets the customer negotiate mitigating commitments.

The red-flag checklist

Some answers disqualify a vendor from consideration regardless of the weighted score. The Excel includes an explicit red-flag sheet; here are the top five:

  1. “No customer has ever migrated off our platform” — with the vendor older than 3 years, this is a statistical impossibility and a signal the vendor doesn’t track or disclose churn.
  2. Refusal to disclose training data sources for foundational models the platform depends on — not the customer’s own fine-tuning data, which is understandably restricted, but the base-model provenance.
  3. No published vulnerability disclosure SLA — indicates no mature disclosure programme exists.
  4. “Our ROI case studies are proprietary and cannot be externally validated” — the vendor is asserting productivity claims they are unwilling to let the buyer verify.
  5. Rate-change notice of less than 30 days — indicates the vendor reserves the right to materially reprice mid-contract, which is incompatible with multi-year enterprise agreements.

Hitting any one of these should trigger a documented exception, not a reflexive rejection, but the default is out-of-consideration.

Using the RFP in a procurement committee

The 60-question Excel is the artifact. The procurement committee process around it:

Phase 1 — send to shortlisted vendors with a fixed response window (typically 3–4 weeks). Shorter windows favour vendors with pre-written answers, which itself signals operational maturity.

Phase 2 — score each response internally before any vendor call. The call introduces bias; score cold first. Two reviewers per dimension, discrepancies resolved in a review session.

Phase 3 — technical/security review on the top 2–3 scorers. At this stage, the evidence in the written RFP is cross-checked against customer references, security questionnaires, and (for GAUGE-70+ candidates) targeted penetration-testing or third-party assessment.

Phase 4 — negotiate commitments to close gaps identified in scoring. For every dimension scoring below 3 average, request a contractual commitment — SLA, roadmap item, exit provision — that closes the gap before signature. Vendors willing to negotiate here are usually the vendors who will survive the 18-month governance cycle.

Note — this RFP does not replace the SaaS-general RFP (uptime, seat pricing, SSO, support). It augments it with the six agent-specific governance dimensions. Run them in parallel; score separately; make the procurement decision against both.

Download · the 60-question RFP Excel

Holding-up note

The primary claim of this piece — that generic SaaS RFPs systematically underweight six agent-specific governance dimensions, and that a GAUGE-aligned 60-question RFP layer materially changes vendor selection outcomes — is on a 60-day review cadence. Three kinds of evidence would move the verdict:

  • Published procurement-committee case studies (anonymised, from analyst firms or consultancies) showing that generic RFPs produced outcomes indistinguishable from GAUGE-augmented ones. Would weaken.
  • Major vendors adopting a GAUGE-style self-disclosure template (responding to the questions publicly, without customer-specific RFP prompts). Would strengthen and partially obviate the need for the RFP artifact itself.
  • Regulatory procurement frameworks (e.g., EU public-sector procurement under Article 68 of the AI Act) converging on similar dimensions. Would absorb some of this piece’s novelty into regulated defaults.

If any land, the Holding-up record for AM-026 captures what changed, dated. Original claim stays visible. Nothing is quietly removed.

ShareX / TwitterLinkedInEmail

Spotted an error? See corrections policy →

Related reading

Vigil · reviewed