The shadow-AI discovery playbook: finding the agents your org already has
The 2024 framing of shadow AI assumed unsanctioned tool adoption. The 2026 reality is agentic capability silently activating inside already-approved tools. A 12-question discovery playbook for enterprise IT, oriented to capability state rather than vendor identity, with the EU AI Act August 2026 deadline as the forcing function.
Holding·reviewed25 Apr 2026·next+59dThe most-cited single statistic in 2026 enterprise agentic AI governance is that 97% of enterprises run AI agents and 12% have centralised control over them (TechHQ, Agentic AI Governance Is the CIO’s Most Urgent Blind Spot). The 85-percentage-point gap is the largest single gap in enterprise IT governance in 2026 by some distance. It is wider than the gap was at the equivalent point of cloud adoption, wider than the gap was at the equivalent point of SaaS adoption, and unusually for governance gaps of this scale, the responsible IT functions are not under-funded or under-staffed. They are running a 2024 playbook against a 2026 problem.
The 2024 playbook treated shadow AI as unsanctioned tool adoption. The remediation was clear: block consumer ChatGPT at the proxy, train workers on approved alternatives, run a quarterly survey to flag policy violations. Most enterprises executed that playbook competently. The 2026 problem is that the playbook addressed the wrong threat surface. The agentic capability that produces the 97/12 gap is mostly inside tools enterprise IT already approved, activated by configuration changes the original procurement did not anticipate. Blocking consumer ChatGPT at the proxy does not affect the Custom GPT a business analyst built last quarter that calls three internal APIs.
This piece is the discovery playbook for enterprise IT teams that need to close the 97/12 gap before the EU AI Act enforcement window opens on 2 August 2026. The playbook is structured as a four-week discovery exercise producing a deployment registry, a triage tree, and an updated procurement-approval record. It is deliberately mechanical rather than strategic. The strategy questions can wait until the inventory is on paper.
Two propositions structure the piece:
- Discovery has to look at capability state, not vendor identity. Most enterprise shadow-AI programmes are built around vendor onboarding (which tools are approved) and user behaviour (who is using which tools). The 2026 exposure surface sits in the seam neither approach catches: capability changes inside already-approved tools, often introduced by configuration not procurement.
- Discovery is the first compliance artifact, not a pre-compliance exercise. Article 9 of the EU AI Act requires a documented inventory of in-scope deployments from 2 August 2026. An enterprise that cannot produce that inventory in the week of a regulator request is in non-conformity, regardless of whether any individual deployment is well-governed. The discovery exercise produces the inventory, which makes it both the lowest-risk compliance artifact and the most-cited basis for the higher-risk artifacts that follow.
The remainder of the piece walks through the operational definition of shadow agentic AI in 2026, the three categories most enterprises miss, the 12-question discovery template, and the triage outcomes that close the loop with EU AI Act preparation and the GAUGE governance scoring.
What “shadow AI” means in 2026
The 2024 definition assumed a worker, a personal browser session, and a consumer-grade AI tool not on the enterprise approved list. That pattern still exists. It is not the dominant pattern any longer.
The 2026 definition has to cover three deployment archetypes the 2024 framing did not include:
Archetype 1: configuration-shifted approved tools. A SaaS tool was approved by IT in 2023 or 2024 as a productivity application. The vendor has since shipped agentic capability behind a feature flag, a pricing tier, or a tool-integration step. The flag has been enabled in the enterprise tenant, often by a workspace administrator who is not part of the original procurement decision-making set. The same licence covers an agent that can act on downstream systems where the original product only suggested outputs to a human reviewer. Microsoft 365 Copilot custom agents, Salesforce Agentforce within an existing Salesforce instance, ChatGPT Enterprise Custom GPTs with action integrations, and Claude with computer-use enabled are all examples of this archetype. The procurement approval did not change. The governance class did.
Archetype 2: integration-stitched custom agents. A user inside the enterprise (typically a developer, often a business analyst or operations lead) has built a custom agent by stitching together approved components. The model is from an approved provider. The orchestration platform is approved (n8n, Make, Zapier, an internal LangChain harness). The data sources are approved. The deployed combination has not been registered with IT because no individual component required new approval and the combination was assembled inside existing tooling. These agents commonly have downstream write capability and the broadest blast radius of the three archetypes.
Archetype 3: MCP-connected developer environments. Engineering teams using Cursor, Claude Code, GitHub Copilot, or comparable IDE-resident agentic tools have configured MCP server connections that the engineering team’s local approval covers but the central governance team has no visibility into. The MCP servers can grant access scopes the developer would not have been granted directly through enterprise SSO. The 10,000+ active public MCP servers documented as of late 2025 (Linux Foundation, Agentic AI Foundation announcement) include enough enterprise-relevant integrations that this archetype is now standard rather than fringe.
All three archetypes share a common property: the deployment exists, has agentic capability, can change state in downstream systems, and is invisible to the central governance function operating on a 2024 model.
The 97/12 gap is structural, not negligent
Most enterprise IT governance functions are not under-resourced relative to their original mandate. They are operating against a mandate that has expanded faster than the governance review cycle can adapt to. The structural problem is in the review trigger:
- The 2024 review trigger fires on vendor change: new tool, new contract, new approval gate.
- The 2026 review trigger needs to fire on capability change: any moment a deployment gains the ability to act on downstream systems, regardless of whether the underlying vendor or contract changed.
Most enterprise procurement workflows have no equivalent of the capability-change trigger. The agentic feature flag flips, the Custom GPT is created, the MCP server is connected, and no procurement event registers in the IT governance log. The deployment exists in a different governance class without crossing any approval gate that would have surfaced the change.
Closing the 97/12 gap requires both the discovery exercise (which catches the deployments already in the gap) and a procurement-workflow update (which prevents new deployments from entering the gap unobserved). This piece covers the first; the procurement-workflow update is the operational sequel and sits in the agentic AI procurement framework at AM-028.
The 12-question discovery template
A complete shadow-AI discovery exercise is twelve questions, applied per surface and per deployment. The template below is the operational core; the GAUGE diagnostic Excel at /gauge/ carries the same questions in spreadsheet form for working-group use.
Surface-level questions (apply to each approved tool):
- Has the vendor shipped any agentic capability into this product in the last 12 months? Vendor changelogs, release notes, and product blogs are the source. The answer is rarely no.
- Is the agentic capability enabled in the enterprise tenant? Workspace-administrator settings, feature flags, and tier-based product gates are where this lives. Most enterprises have several yes answers they did not consciously authorise.
- Which user roles can configure the agentic capability further? End users? Tenant admins? A specific approved group? Track the answer for each surface.
Deployment-level questions (apply to each specific instance found):
- Who created this specific deployment, and when? Custom GPT creation logs, Copilot custom agent registries, n8n workflow ownership, equivalent for each platform. The default answer should be retrievable; if it is not, that is itself a finding.
- What downstream tools or APIs does this deployment call? Internal CRM, HRIS, ticketing, code repository, payment system, customer database, regulated-data store. The list determines the blast radius.
- What data scopes does this deployment operate against? Sensitivity classification of the data accessible to the deployment, mapped against the enterprise data-classification policy.
- Does the deployment have downstream write capability? Yes/no with the specific systems named. This is the single most important variable for governance class.
- What is the user-facing decision flow? Does a human review and approve each agent action? Does the agent act first and a human review periodically? Does the agent act with no human review? Each is a different governance class.
Governance-fit questions (apply per deployment after the inventory is complete):
- Does the original procurement approval cover the current capability scope? The answer is rarely yes for configuration-shifted approved tools.
- Does this deployment fall within EU AI Act Annex III scope? The materiality test is whether its output influences a decision in one of the eight high-risk categories. The full treatment is at /eu-ai-act-agentic-ai-compliance/.
- What is the deployment’s GAUGE score across the six governance dimensions? The free Excel runs the scoring in 30 to 45 minutes per deployment.
- Where does this deployment sit on the triage tree: approve-in-current-state with updated documentation, gap-fix track (4 to 8 weeks engineering), or pause/redesign?
The twelve questions answered for every discovered deployment produce the deployment registry. The registry is the artifact. The governance work that follows the registry is staged across multiple quarters; the registry itself is the prerequisite.
Mapping discovery to EU AI Act, GAUGE, and IAM
The discovery exercise feeds three downstream governance workstreams that each have their own deadlines and obligations:
EU AI Act Article 9 risk-management system. The Article 9 inventory requires the deployment registry produced by discovery. An Article 9 risk-management system without a deployment registry is incomplete, and the deficiency is one a regulator can name in a non-conformity finding. The 2 August 2026 enforcement window applies. The full Article-by-Article walkthrough is at /eu-ai-act-agentic-ai-compliance/ (claim AM-035).
GAUGE governance scoring. The six-dimension GAUGE framework scores each deployment against governance maturity, threat model, ROI evidence, change management, vendor lock-in, and compliance posture. Discovery produces the deployment registry; GAUGE produces the per-deployment score. The two outputs together drive the EU AI Act preparation track. The Excel diagnostic at /gauge/ is the working-document version.
Non-human identity (NHI) management. The deployment registry surfaces the non-human identities operating against enterprise systems on behalf of agents. Most enterprise IAM platforms in 2026 do not natively cover dynamic, ephemeral agent identities (The Hacker News, AI Agents and Identity Dark Matter). The discovery output identifies the NHI scope; the IAM remediation work follows. Reportedly, 92% of enterprises do not believe their existing IAM systems are adequate for the agentic-AI identity load (Gravitee 2026 State of AI Agent Security). Discovery is the input that makes the IAM problem tractable rather than abstract.
The three workstreams share the discovery output. Running them as parallel governance projects without the shared discovery artifact produces three differently-scoped inventories, three sets of remediation work, and a structural inconsistency the next regulator request reveals.
What to do with what you find
The triage outcomes from the discovery exercise distribute roughly as follows in 2026 enterprise environments:
- 50 to 70% approve-in-current-state with updated procurement records. The deployment is well-scoped, has an identifiable owner, operates against appropriate data, and the original procurement approval can be amended to cover the actual capability scope. The remediation is documentation, not engineering. Two to four hours of governance work per deployment closes the gap.
- 20 to 40% gap-fix track. The deployment is genuinely in scope but missing one or two governance dimensions: typically Article 14 human oversight (the reviewer authority is not documented to Article-14 specificity) or Article 12 logging (the logging posture is operational debug, not regulator-readable). Four to eight weeks of engineering closes the gap. The cost is real but bounded.
- 5 to 15% pause or redesign. The deployment is configured in a way that no feasible governance overlay can make compliant. The most common pattern is an integration-stitched custom agent calling tools that operate outside enterprise data-classification scope, with no realistic path to bringing it inside scope without rebuilding it as a sanctioned deployment. Catching the 5 to 15% before the EU AI Act enforcement window opens is the highest-leverage outcome of the discovery exercise.
The triage tree itself is the deliverable to the executive sponsor. The CIO or equivalent does not need to see every deployment. They need the one-page summary of how many deployments fell into each track, where the highest-impact gap-fix work concentrates, and what the pause/redesign cohort looks like.
What to do Monday
Fourteen weeks remain before 2 August 2026. The discovery exercise can run in parallel with the EU AI Act preparation track described at /eu-ai-act-agentic-ai-compliance/. The realistic timing:
Week 1. Stand up the joint team: one governance lead, one IAM/security lead, one architecture or platform engineering lead, with executive sponsorship from the CIO. The team’s first deliverable is the surface inventory of every approved SaaS tool, IDE, and productivity platform with current or recent agentic capability. Most enterprises produce a 25 to 60-item surface inventory.
Week 2. Per surface, run the deployment-level discovery. Custom GPTs, Copilot custom agents, n8n workflows, MCP server connections, equivalent for each platform. Capture creation date, owner, downstream tools, data scopes, write capability. Most enterprises find 200 to 800 deployments at this step. The number is alarming the first time it appears; it is the realistic baseline.
Week 3. Score each discovered deployment with GAUGE on the six dimensions and against the EU AI Act Article 12 logging posture question. Use the GAUGE Excel diagnostic at /gauge/. Allocate scoring sessions: governance lead leads, surface-owner attends. The disagreements across functions are the signal.
Week 4. Build the triage tree. Three tracks per deployment, with a one-page executive summary the CIO can act on. Update the central agentic-AI registry with the per-deployment classification, scoring, and next-review date. Hand the registry to the EU AI Act preparation track and the IAM remediation team simultaneously.
The discovery exercise repeats quarterly thereafter. New deployments enter the registry as they are created. The inventory itself becomes the central operational artifact for agentic AI governance, with the GAUGE scoring producing the per-deployment governance quality signal across review cycles.
The Holding-up note
The primary claim of this piece (that enterprise shadow AI in 2026 is structurally different from shadow AI in 2024, and that capability-state discovery rather than vendor-identity policing is the operational fix) is logged at AM-036 on the Holding-up ledger on a 60-day review cadence. Three kinds of evidence would move the verdict:
- Vendor-side capability-change controls. Major vendors that lock down Custom GPT actions, Copilot custom agents, MCP server connections, and equivalent constructs behind enterprise-admin approval would partly resolve the 2026 problem at the source. Currently most do not, but Microsoft, Google, and Anthropic have each shipped tenant-admin governance improvements in the last 12 months. The trajectory matters; the destination is not yet visible.
- Regulatory enforcement actions on configuration-shifted deployments. EU AI Act enforcement actions where the in-scope deployment was a capability change on an approved tool, not a new tool, would either confirm the broader-scope reading from claim AM-035 or narrow it. The first batch of actions after 2 August 2026 will be informative.
- Enterprise-IAM platforms shipping native agent-NHI discovery. Platforms like Okta, Microsoft Entra, Ping, and adjacent vendors have signalled investment in agent-identity discovery. A native agent-NHI inventory shipped at the IAM layer would compress the discovery exercise from four weeks of joint-team work to a few days of report review. The platforms are not there yet; the trajectory is clear.
The next review of this claim is scheduled 24 June 2026. The discovery exercise this piece describes can be completed in fourteen weeks; the August 2026 EU AI Act window opens within that horizon. Enterprises that begin the discovery work this week are well-placed against the deadline; enterprises that begin in June are not.
Spotted an error? See corrections policy →
Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.
Shadow AI discovery →
Detecting unauthorised agentic-AI deployments inside the enterprise — telemetry patterns, inventory methods, policy response. 1 other piece in this pillar.