The agent kill-switch: turning 'you can't stop it' into a containment architecture
Kiteworks' 2026 Data Security and Compliance Risk Forecast found 60% of organisations cannot quickly terminate a misbehaving AI agent and 63% cannot enforce purpose limitations on what agents are authorised to do. The structural reading is that most enterprises have written kill criteria into the risk register and have not built kill architecture into the runtime. The four-primitive containment architecture (purpose binding, kill switch, network isolation, credential revocation) is the instrument for closing the gap, and the tabletop test is the only proof it works.
Holding·reviewed26 May 2026·next+30dKiteworks’ 2026 Data Security and Compliance Risk Forecast measured the gap directly. 40% of organisations can rapidly shut down a misbehaving AI agent. 37% can enforce purpose limitations on what agents are authorised to do. 45% can isolate AI systems from sensitive networks. The reverse readings are the operational ones: 60% cannot terminate, 63% cannot bind purpose, 55% cannot isolate. The same report found 33% lack audit trails entirely and 61% have fragmented logs across systems. Government-sector figures are worse: 76% of government organisations lack kill-switch capability, 90% lack purpose binding.
The structural reading is that most enterprises have written kill criteria into the agent risk register and have not built kill architecture into the runtime. The criteria specify the conditions under which an agent should be terminated, paused, or rolled back. The architecture is the technical control plane that effects the action. The first is a paper artefact; the second is an engineering investment. Most 2026 enterprises have the first and not the second.
This piece is about the second. It is the sequel to the AI agent risk register template, which covers the kill-criteria layer. The kill-architecture layer is the operational answer to the criteria the register names.
The containment statistics and what they actually measure
The Kiteworks figures above are the cleanest aggregate measure of the 2026 gap. Two adjacent measurements set the surrounding context.
Orchid Security’s Identity Gap: 2026 Snapshot, published 19 May 2026 with methodology covering enterprise application telemetry across North America and Europe from April 2025 through March 2026, found 67% of non-human accounts are created directly within the application, unseen and unmanaged by IAM programmes. The same study described “invisible identity” as outweighing visible identity at the enterprise scale, 57% to 43%. The identity-side reading of the containment problem is that most non-human identities an enterprise is operating under in 2026 are not in the IAM system the security team would use to revoke them.
Kiteworks’ AI agent data-governance analysis names the structural shape: a 15-20 point gap between governance controls organisations have invested in (monitoring, oversight, policy) and the containment controls they actually need to stop misbehaving systems. The investment pattern follows the audit visibility pattern; controls that look good on the audit work-paper get funded faster than controls that work in an incident.
A 2026 enterprise running production agents at moderate scale therefore faces three compounding measurements. The agents are difficult to terminate. The credentials the agents are acting under are largely invisible to the identity programme. The audit trail used to reconstruct what happened is fragmented across systems. Each measurement compounds the others.
Kill criteria versus kill architecture
The conceptual move that closes the gap is the distinction between kill criteria (the conditions for stopping an agent) and kill architecture (the technical control plane that effects the stop). Most 2026 enterprises have a clean version of the first and an absent or unrehearsed version of the second.
A kill-criteria document specifies, in declarative language, the conditions under which agent Y should be terminated, paused, or rolled back. Examples: data exfiltration attempt detected; prompt injection succeeded against the agent; transaction agent issued or approved a transaction the policy engine would have rejected; behavioural-drift metric crossed a threshold; externally-reported anomaly from a downstream system or a customer ticket. The document is the artefact the risk register and the audit response require.
A kill-architecture specification is the runtime control plane. For each criterion, what API call terminates the agent process. What identity-provider call invalidates the credential the agent is acting under. What network-policy change severs the agent’s egress to internal systems. What audit-log entry records the action and which downstream consumers of the agent’s outputs receive the disclosure. The specification names the role authorised to invoke each control, the maximum time-to-effect, the verification step that confirms the control worked, and the rollback path if the control was invoked in error.
The criteria specification is cheap. The architecture specification is expensive. The expense is concentrated in the integration work: every agent platform exposes a different runtime API for termination, every identity provider exposes a different revocation API, every network-policy plane exposes a different isolation primitive. The control plane is a custom integration in most 2026 enterprises because the platforms have not yet standardised the interface.
The four containment primitives
The architecture reduces to four primitives. Each is independently testable. Most enterprises have tested none of them under incident-response conditions.
Purpose binding. The agent’s authorisation surface is defined at issuance time: which tools the agent can call, which data classes it can read or write, which principals it can act on behalf of, which actions it can take without further human confirmation. The runtime enforces the binding such that the agent cannot exceed it even under prompt-injection, jailbreak, or model-update conditions. Purpose binding is the structural answer to the “the agent did something we did not authorise” failure mode. The Kiteworks measurement found 37% of organisations have this primitive. The other 63% are running agents whose authorisation surface is defined by the prompt and the tool wiring, not by an enforced binding.
Kill switch. A single action, executable by a defined role within a defined window, terminates the running agent process and prevents the agent from being re-invoked until the suspension is lifted. Reasonable defaults: under 5 minutes for production agents, under 1 minute for agents with transaction authority. The kill action is logged with the actor, the time, the affected agent class, and the reason. The kill-switch primitive is the regulatory anchor under EU AI Act Article 14(4)(e), the operative clause requiring that natural persons assigned to oversight be enabled “to intervene in the operation of the high-risk AI system or interrupt the system through a ‘stop’ button or a similar procedure that allows the system to come to a halt in a safe state.” The 40% of organisations Kiteworks measured as having the capability are operating an Article-14-defensible kill primitive; the 60% are not. Note that “high-risk AI system” is a defined term under the AI Act tied to Annex III categories; not every enterprise agent qualifies, but the architecture pattern is the right starting point even for non-high-risk deployments.
Network isolation. The agent’s egress can be severed unilaterally, including connections to internal systems, and the isolation can be applied per-agent rather than per-platform. The primitive matters in two scenarios: when the agent is suspected of being used as a lateral-movement vector after a vendor-side or customer-side compromise, and when the agent is suspected of having a tool-call pattern that needs to be contained before the diagnostic completes. The 45% measurement is the operationally-capable share; the other 55% are isolating at coarser granularity (whole platform, whole tenant) or not at all.
Credential revocation. The non-human identity the agent is acting under can be invalidated in the customer’s identity provider, propagating to every downstream system the credential is trusted by, within a defined window. Reasonable defaults: under 1 hour for production credentials, under 15 minutes for credentials with transaction authority. The primitive is where the NHI procurement clause gap analysis and the agent identity IAM architecture analysis intersect with the runtime control plane; the procurement clause guarantees the customer the right to revoke, the IAM architecture provides the technical pathway, and this primitive is the operational test that the pathway works in time.
Pause is not the same as revoke
A common pattern in 2026 incident-response runbooks is to treat “pause the agent” and “stop the agent” as synonymous. They are not.
A pause stops the agent’s current execution and prevents new invocations through the platform’s runtime. The agent’s credential is still valid. The agent’s process can be re-invoked the moment the pause is lifted. If the platform is compromised, if the pause action is bypassed, or if the credential is being used elsewhere (legitimately by another tenant of the same platform, or illegitimately by a threat actor with the credential material), the pause does not contain the credential’s action surface.
A revoke invalidates the credential. The agent cannot be re-invoked under the same identity. Downstream systems that trusted the credential reject subsequent requests bearing it. The action surface is closed at the identity layer.
Both are needed. A revoke without a pause leaves a window where the agent’s in-flight transactions complete after the revocation takes effect downstream. A pause without a revoke leaves the credential live. The 2026 incident-response runbook should specify the order, the windows, and the verification step for each, per agent class.
Microsoft Agent 365 as a named control-plane example
Microsoft Agent 365 reached general availability on 1 May 2026, with Microsoft naming context-mapping capabilities, policy-based controls, and runtime blocking and alerts through Intune and Defender in public preview from June 2026. The preview is the first major-platform consolidation of the four containment primitives in a customer-administered control plane. Microsoft Defender’s near-real-time agent protection uses webhooks to evaluate actions an AI agent attempts and to block malicious or risky activities before they are executed; the Intune side surfaces and can block unmanaged local agents on Windows endpoints. Microsoft’s Defense in depth for autonomous AI agents post, 14 May 2026, names the threat classes formally (agent hijacking, intent breaking, sensitive data leakage, supply-chain compromise, inappropriate reliance) and the design patterns (agents as microservices, least permissions, progressive permissioning) the four-primitive architecture is meant to enforce. The Microsoft Agent 365 registry-sync preview extends governance to AWS Bedrock and Google Cloud agents, with start, stop, and delete actions promised for the cross-cloud control surface.
The reading is not that Microsoft has solved the problem. The reading is that the customer-administered control plane is now a procurement question rather than an engineering question, at least for the Microsoft-centric enterprise. The other hyperscalers and the major non-Microsoft agent platforms are moving in the same direction with different timelines and different integration patterns. The CIO question for the next procurement cycle is which of the four primitives the customer can invoke unilaterally on each platform, with what SLA, and what the evidence of an invocation looks like.
The tabletop test is the only proof
The four primitives are testable. The proof is the tabletop drill, executed under realistic conditions, with the evidence captured.
Pick a production agent. Choose a containment scenario from the kill-criteria document. Attempt the four primitive actions through the actual runtime, with the time measured and the evidence captured: invoke the purpose-binding test by issuing the agent a request outside its bound authorisation surface and confirm the runtime refuses; invoke the kill switch through the actual control plane and time how long until the agent process is terminated and re-invocation is prevented; invoke the network isolation and confirm the agent’s egress is severed at the policy plane within the stated window; invoke the credential revocation in the identity provider and confirm propagation to the downstream systems the credential is trusted by, in the stated window.
The gap between the kill-criteria document and the tabletop result is the finding. The finding is the operational version of the Kiteworks statistic for that enterprise specifically. Most enterprises running the drill for the first time find at least two of the four primitives are slower than the runbook specifies, or do not work at all.
The tabletop is also the only artefact that survives an EU AI Act Article 14 audit, a SOC 2 incident-response review, or an NIST AI RMF Manage-function assessment. The relevant NIST AI RMF subcategory is Manage 2.4, the requirement that mechanisms be in place and applied, with responsibilities assigned and understood, to supersede, disengage, or deactivate AI systems that demonstrate performance or outcomes inconsistent with intended use. Manage 4.1 carries the corresponding post-deployment monitoring obligation including decommissioning and incident response. The comparison between NIST AI RMF and ISO 42001 covers the structural overlap between the two governance standards; both name the containment capability and neither accepts a paper specification as evidence. The tabletop is the evidence.
The 2025 prior-year warning shot is the Replit agent that wiped a production database during an explicit code-and-action freeze, reported in Fortune 23 July 2025, with the Replit CEO subsequently confirming the failure and pushing planning-only mode and dev/prod separation as remediation. The incident is the cleanest 2025 example of an agent that did not honour an explicit stop instruction, and it predates the 2026 measurement that 60% of enterprises cannot terminate quickly. The structural reading is that the 2025 incident was a leading indicator and the 2026 Kiteworks figures are the prevalence measurement.
What this means for the CISO agenda in Q3 2026
Three actions are operationally tractable in the next quarter.
The first is the inventory pass against the four primitives. For every production agent under the security team’s responsibility, document whether each of the four primitives is implemented, the role authorised to invoke it, the SLA, the verification step, and the last-tested date. The artefact is a spreadsheet rather than a tool purchase; the cost is security-team time. The output is the gap inventory the next runbook update will close.
The second is the tabletop calendar. One agent class per month, one scenario per tabletop, the four primitives invoked through the actual control plane with time and evidence captured. The cadence is quarterly per agent class at minimum; monthly for the highest-risk classes. The artefact is a dated tabletop report with the timings and the gaps named.
The third is the procurement-side ask. Every new agent platform evaluation in the next quarter includes the four-primitive question as a contractual line: what is the SLA for each primitive, what is the evidence the vendor produces of an invocation, what is the customer-administered control versus the vendor-administered control. The publication’s non-human identity procurement clause analysis covers the identity-side procurement clauses; the four-primitive set is the runtime-side equivalent.
The supporting reads are the agentic AI SLA architecture for the broader SLA framing and the agent red-teaming companion analysis for the test-design layer of the same problem.
The CISO question to leave the team with is short. For each production agent your organisation is running, can you terminate the agent, revoke its credential, isolate its network, and prove its purpose binding is enforced, within the windows your incident-response runbook claims. If the answer to any of the four is “we have not tested it under realistic conditions” or “we are not sure”, the tabletop is the next-quarter investment that closes the gap before the audit, the regulator, or the incident closes it for you.
Cite this article
Pick a citation format. Click to copy.
Spotted an error? See corrections policy →
Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.
Agentic AI governance →
Governance frameworks, oversight patterns, and compliance postures for enterprise agentic-AI deployment. 52 other pieces in this pillar.