Agentic AI discovery: what the phase upstream of procurement actually has to test
McKinsey reports a $2.7 trillion paradox: 80% of companies use generative AI but report no bottom-line impact. Gartner projects 40% of agentic AI projects will be cancelled by end of 2027. Gartner's January 2025 poll of 3,412 executives (19% significant investment, 42% conservative, 31% wait-and-see, 8% none) describes the phase distribution. The discovery phase upstream of procurement is not a vendor-evaluation sprint; it is an organisational-readiness test. Four upstream tests determine whether the deploying enterprise should proceed at all, and the right answer for a meaningful share of organisations remains 'not yet'.
Holding·reviewed07 May 2026·next+48dBottom line. McKinsey’s ‘Seizing the agentic AI advantage’ research describes a $2.7 trillion paradox: 80% of companies use generative AI but report no bottom-line impact (McKinsey). Gartner projects 40%+ of agentic AI projects will be cancelled by end of 2027 (Gartner, 25 June 2025). Gartner’s January 2025 poll (n=3,412 executives) places organisations in four postures: 19% significant investment, 42% conservative, 31% wait-and-see, 8% none. The discovery phase upstream of procurement is an organisational-readiness test, not a vendor sprint. Four upstream tests determine whether the procuring enterprise should proceed at all, and ‘not yet’ is a defensible outcome for a meaningful share of organisations.
McKinsey’s research on agentic AI value capture surfaces the central paradox: 80% of companies report using generative AI, but the same population reports no measurable bottom-line impact at the firm level (McKinsey, “Seizing the agentic AI advantage”). The paradox sits at $2.7 trillion of potential value McKinsey identifies elsewhere in the same research thread. Gartner’s June 2025 forecast projects that 40%+ of agentic AI projects will be cancelled by end of 2027 (Gartner, 25 June 2025). The two numbers describe the same operational shape from opposite directions: many organisations engaged with agentic AI are not capturing value, and many that are mid-deployment will not finish.
This piece reads the discovery phase upstream of those outcomes. Discovery is the period in which the procuring enterprise decides whether to engage at all, evaluates its own readiness rather than the vendor’s product, and either proceeds to a specific procurement decision or returns to fix upstream gaps. The most common 2026 enterprise discovery error is treating discovery as a vendor-evaluation sprint to a go-decision; doing so collapses two distinct decisions (organisational readiness, vendor selection) into one and makes both worse.
What discovery actually has to test
IBM’s Maryam Ashoori frames the definitional anchor: an AI agent is an intelligent entity with reasoning and planning capabilities that can autonomously take action (IBM, 2025 expectations vs reality). Vanderbilt’s Jules White, teaching the Coursera “Agentic AI for Leaders” course, gives the operational version: agents take action (updating CRMs, scheduling meetings, executing trades), they do not just generate, they do.
The definitional anchor matters because it bounds what discovery is testing. The procuring enterprise is not testing whether large language models work; it is testing whether the organisation is ready to operate an autonomous-action system in production. The questions are different, the failure modes are different, and the procurement decisions that flow from each are different.
Four upstream tests determine readiness. They are organisational rather than technological, and they are answerable in writing before any vendor conversation begins.
Test 1: Definitional clarity across the senior team
The first test is whether the executive layer shares an operational definition of what an agent IS, distinct from chatbot, RPA, and generative AI tooling. Discovery sessions where the CFO is operating from one definition and the CTO from another produce procurement decisions that look aligned in the deck but diverge in operating reality.
The IBM and Vanderbilt anchors are the working definitions: autonomous action, reasoning, planning, end-to-end workflow ownership. Gartner’s January 2025 release named the parallel risk explicitly under the label “agent washing”: vendors rebranding RPA or chatbot products as “agentic” without the autonomous-action capability that defines an agent in the analytical sense (Gartner, 25 June 2025). Discovery without definitional clarity at the senior-team level is procurement-vulnerable to agent-washed product positioning by default.
The remediation is internal and short. A two-hour senior-team workshop using the IBM and Vanderbilt anchors, with a written one-page operational definition signed by the executive layer, is the deliverable. It is not visible to vendors, it does not require external consultants, and it is the cheapest insurance against the McKinsey-paradox cohort that procures agentic AI but does not capture value because the senior team was not aligned on what was being procured.
Test 2: A named operational candidate workflow with measured baseline and named owner
The second test is whether there is one specific workflow with a documented failure mode and an accountable owner that the deployment will own end-to-end. The procurement-decision question raised by the CFO at scale-up review (what is the net benefit of scaling this from 50 users to 5,000) is unanswerable without this; the AM-140 procurement-committee question 1 on baseline and the AM-010 first operational characteristic on CFO-defensible measurement both depend on the discovery-phase work.
The named-workflow test fails when the discovery output is “we want to use agentic AI” rather than “we want this specific workflow, currently owned by this named team, currently failing in this specific way, instrumented with this baseline metric, to be operated by an agentic deployment with this named owner.” The first is a procurement intent; the second is a procurement question that vendors can actually answer.
The remediation is also internal. Pick the workflow before the first vendor demo, instrument the baseline for 4–6 weeks, name the owner with reporting line on the org chart. The procuring enterprise that arrives at the first vendor demo with this work done evaluates the vendors against its own measurement; the procuring enterprise that arrives without it evaluates vendors against vendor-supplied numbers and inherits all the methodology weaknesses the CFO’s business case piece catalogues.
Test 3: Threat-model literacy on the agentic AI failure classes
The third test is whether the security team understands the agentic AI failure classes before the vendor demo. Two classes specifically: the cross-agent prompt-injection class (AgentFlayer, EchoLeak / CVE-2025-32711, covered in AM-007) and the browser-resident agent class (Anthropic’s Claude for Chrome disclosure with the 23.6% / 11.2% / 0% rates, covered in AM-009).
A security team without literacy on these classes evaluates vendors against an outdated threat model. The deploying enterprise inherits the residual cross-agent and browser-resident exposure regardless of vendor classification; under EU AI Act Article 9, NIS2 Article 21, and DORA where applicable, the operator owns the residual risk. Discovery without threat-model literacy means the procurement decision will be made by a team that cannot articulate the compensating-control burden the deployment will require.
The remediation is one internal session per class, walking through the published research (Zenity Labs on AgentFlayer, NVD on CVE-2025-32711, Anthropic’s published security disclosure on Claude for Chrome, Brave Software’s research on Comet). The five questions in AM-007 and the five questions in AM-009 are the procurement-deck artefacts the security team produces from those sessions.
Test 4: Workforce readiness against the BCG access gap
The fourth test is whether the workforce population that will operate the deployment has the AI-upskilling access the deployment requires. Boston Consulting Group’s October 2024 study reports a 14% frontline-worker vs 44% leader gap in AI upskilling access (BCG, 24 October 2024). The same study finds 74% of companies struggle to achieve and scale AI value at the firm level, a separate signal pointing at the same operational reality.
The Atlanta Fed wage-premium analysis reads the BCG number from the labour-market angle. The deploying enterprise reads it from the deployment-success angle: agentic deployments land on a workforce, and the workforce’s capability to operate alongside the agent is the difference between adoption and rejection regardless of vendor capability.
Dialpad’s CSuite report frames the parallel data-readiness question: 91% of companies lack sufficient data quality for agentic AI deployments at scale, while only 6% have begun meaningful workforce preparation (Dialpad CSuite Report). The 91% data-quality gap is a separate procurement-deck issue; the 6% workforce-preparation figure is the discovery-phase test for this fourth condition.
The remediation requires a budget commitment and a timeline, not a single session. Identify the workforce population the deployment will reach, score it against the BCG 14% baseline, and either commit to closing the gap before deployment scope expands or scope the deployment to the population whose access is already in place.
What the Gartner January 2025 distribution actually tells discovery
Gartner’s January 2025 poll of 3,412 executives places organisations in four postures: 19% have made significant agentic AI investments, 42% have made conservative investments, 31% remain in ‘wait-and-see’ mode, and 8% have made no investments (Gartner).
The procurement-deck misreading of this distribution is to treat the 31% + 8% = 39% non-engaged cohort as a failure of discovery. The opposite is closer to the truth. A meaningful share of that 39% is correctly identifying that the four upstream tests are not yet cleared and that proceeding to procurement would land the organisation in the McKinsey 39% experimenting cohort or the 38% deployed-and-stopped cohort that the McKinsey 23% piece describes.
“Not yet” is a defensible discovery-phase outcome. The 31% wait-and-see population that pairs the wait with active work on the four upstream tests will move to engagement on a stronger footing than the 19% who proceeded to significant investment without that work. Gartner’s 40%+ cancellation projection by end of 2027 is the downstream consequence of mixing those two paths inside the engaged 61%.
What “agent washing” actually looks like and why discovery has to filter for it
Gartner’s same January 2025 release named four agent-washing patterns: products with no live autonomous-action demonstration; products whose autonomy claims are vague and not exercised in the demo; products requiring constant human oversight at every step (which is RPA, not agentic); and products that are rebranded chatbots or RPA tooling.
Discovery-phase filtering against these patterns is operational. Require the live demonstration of the autonomous-action capability against the procuring enterprise’s named candidate workflow (test 2), with the procuring enterprise’s threat-model literacy (test 3) shaping which capabilities the security team allows in scope. The agent-washed product fails the live-demo test by definition; the legitimately agentic product passes the live-demo test and then has to face the cross-agent and browser-resident class questions.
The remediation here is procedural rather than analytical. Build the agent-washing filter into the discovery-phase vendor-screening protocol so the procurement evaluation never reaches the formal stage on a non-agentic product. The cost is one round of additional rigor in the early vendor screen; the benefit is preventing the procurement decision from being made on a product that does not meet the analytical definition.
What the Asia-Pacific discovery work tells us
Deloitte’s Asia Pacific Agentic AI Centre, spanning practitioners across India, Malaysia, and Singapore, frames the regional discovery work as an ecosystem activity rather than a single-firm sprint (Business Today, 11 June 2025). The Centre’s frame matters for the discovery question because it surfaces a pattern: shared discovery work across organisations within an industry or jurisdiction can close the readiness gaps faster than every organisation rebuilding the four upstream tests independently.
For the procurement-deck reader: regional or sector-level discovery work that the procuring enterprise can plug into is procurement-relevant signal. It does not replace the four internal tests, but it can shorten the time-to-clear by giving the senior team, the security team, and the workforce-readiness owner a shared external reference to anchor against.
Holding-up note
The primary claim of this piece (that the agentic AI discovery phase upstream of procurement is an organisational-readiness test, not a vendor-evaluation sprint, and that four upstream tests determine whether the procuring enterprise should proceed at all) is on a 60-day review cadence. Three kinds of evidence would move the verdict.
A subsequent Gartner or analogous executive-poll wave compressing the 19/42/31/8 distribution materially would directly update the central numbers without changing the framing. A new analyst framework or academic publication explicitly proposing a discovery-phase methodology that supersedes the four upstream tests would strengthen the field-level discipline and require the framing to be re-read against the alternative. Regulatory action (EU AI Act post-market monitoring, sectoral regulator) imposing a discovery-phase due-diligence requirement on agentic AI deployments would substantively reshape the variable set; the four tests would remain organisationally valid but would acquire a regulatory layer the current framing does not address.
If any land, the Holding-up record for AM-004 captures what changed, dated. Original claim stays visible. Nothing is quietly removed.
Cite this article
Pick a citation format. Click to copy.
Spotted an error? See corrections policy →
Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.
AI agent procurement →
The contracts, SLAs, and evaluation criteria that distinguish agentic-AI procurement from SaaS procurement. 29 other pieces in this pillar.