Skip to content
Framework

MTTD for agents — detection-time as the governance metric enterprise IT actually needs

Mean Time To Detect, adapted from SRE to enterprise agentic AI. A leading indicator for safety. Measurable per agent, monitorable per deployment. Targets published. Methodology open.


What MTTD-for-Agents measures

MTTD-for-Agents is the time, in hours, from an anomalous agent behaviour occurring to the deploying organisation detecting it. It is adapted from the SRE metric Mean Time To Detect, narrowed specifically to cover agentic systems: cross-agent delegation, tool-use anomalies, output-distribution drift, and emergent delegation patterns that don’t surface in traditional application monitoring.

The metric exists because enterprise AI governance conversations in 2026 are dominated by after-the-fact incident reports. The 2026 Enterprise Agentic Governance Benchmark scores governance posture as a snapshot — MTTD-for-Agents scores the velocity of that posture. A deployment can score 75 on GAUGE and still have a 3-day MTTD, which means incidents that happen at agent speed land entirely as post-mortems.

Detection-time is the right primary safety metric because agents execute at machine speed. Humans monitoring chat logs after the fact is not a detection mechanism — it is a reporting mechanism for breaches that already completed.

How MTTD-for-Agents differs from SRE MTTD

Traditional SRE MTTD measures the time from an availability or performance incident starting to an on-call engineer being alerted. It’s well-defined because site failures produce immediate observable signals: error rates spike, latency bands move, customer reports arrive. The SRE incident-response playbook is mature and widely practiced — Google, Atlassian, PagerDuty, and others publish their own variants against the same detect-triage-contain-resolve scaffold.

Agentic incidents don’t look like site failures. Four specific differences:

  • The fail-mode is semantic, not statistical. An agent quietly starts emailing customer data to an attacker-controlled address. Error rates look normal. Latency looks normal. The log line shows a successful tool call. SRE MTTD’s instrumentation stack has no reason to fire.
  • Execution is adversarial in ways applications aren’t. EchoLeak — the Q1 2026 zero-click prompt-injection on Microsoft 365 Copilot — exfiltrates data in seconds to minutes. A 4-hour SRE-grade detection window is an outcome metric, not a detection window, against that profile.
  • Agents delegate to other agents. A high-risk action executed via a delegation chain is invisible at each individual agent’s boundary. Detection requires cross-agent telemetry, not per-service monitoring.
  • The baseline drifts. LLMs get updated. Prompt libraries change. What “normal” looks like for an agent today is different than it was 30 days ago. SRE MTTD uses fixed SLOs; MTTD-for-Agents requires rolling baselines.

The discipline inherits the SRE detection-chain structure — tripwire → alert → triage → contain → postmortem — but the specific tripwires, baselines, and containment actions are agent-specific. The NIST Computer Security Incident Handling Guide (SP 800-61r2) frames the equivalent lifecycle for traditional IT incidents; the MTTD-for-Agents variant substitutes the detection layer with agent-surface telemetry while keeping the response workflow largely intact.

The three components

Every MTTD-for-Agents program has three components. They depend on each other in sequence — skipping any one means the chain doesn’t close.

1. Baseline behaviour profile · measure normal

For the first 14 days of production (or the first 14 days after any model or prompt change), log every agent interaction at four dimensions:

  • Tool-use frequency per hour
  • Output length distribution (mean, Z-score standard)
  • Cross-agent delegation rate
  • Refusal rate per 1,000 interactions

The profile is rolling-window — it updates as the deployment accumulates more evidence. It is the only reason tripwires mean anything: without a baseline, “tool-use frequency spike” is an undefined event.

2. Tripwire metrics · detect deviation

Four per-agent tripwires fire when the current window deviates from the baseline profile beyond a threshold. The canonical thresholds in v1:

  • Tool-use frequency anomaly — current hourly rate exceeds baseline_mean + 3σ
  • Output length Z-score — current window mean deviates from baseline by |Z| > 2.5
  • Cross-agent delegation rate — current delegation rate exceeds baseline_mean + 2σ OR any delegation to an agent not in the approved delegation map
  • Refusal rate — current refusal rate drops below baseline_mean - 2σ (an agent suddenly refusing less often is as suspicious as refusing more often)

Tripwires are monitored at the agent-surface, not the application-surface. Instrumentation has to be attached to the agent’s tool-call boundary, not the upstream HTTP layer — because the HTTP request looks the same whether the agent is calling a safe tool or a compromised one.

3. Detection chain · convert signal to action

Once a tripwire fires, the chain runs:

  1. Alert — pages the on-call security engineer within 5 minutes of tripwire firing. Alert includes: which tripwire, which agent, current vs baseline values, links to recent tool-call logs.
  2. Triage — engineer determines scope within 30 minutes. Is this an incident, a model update, or a baseline-drift false positive? Three possible outcomes: dismiss with logging, investigate further, escalate to containment.
  3. Contain — if investigating further, containment actions within 1 hour. Options from lowest to highest impact: rate-limit the agent’s tool access, disable the specific tool, disable the agent, revoke the agent’s credentials.
  4. Postmortem — within 5 business days, documented in a format that feeds back into the baseline profile. What was the tripwire’s predictive quality? Should the threshold move?

The chain is deliberately 5-step instead of the SRE-standard 4-step (detect/triage/contain/resolve) because the postmortem has to feed the baseline. Without that loop, MTTD stays flat instead of improving.

Target values — with rationale for the numbers

Large enterprise, high-risk agents: MTTD < 4 hours.

The 4-hour number is not conservative. It assumes the attacker cannot complete the objective in under 4 hours from tripwire firing to containment. For EchoLeak-class zero-click exfiltration, 4 hours is an outcome metric, not a detection window — the data is already gone. The 4-hour target therefore applies to incidents where the adversarial action requires more than a single tool call to complete: cross-agent privilege escalation, multi-step data aggregation, lateral movement through delegation chains. Which is most real enterprise threats, but not all.

Mid-market, standard-risk agents: MTTD < 24 hours.

Mid-market operations rarely have 24/7 SOC coverage. The 24-hour target assumes detection by next-business-day review, with containment actions fired manually. Beyond 24 hours, the detection ceases to be a leading indicator and becomes an audit.

Both targets are published here and will be reviewed at 12-month cadence. Submit feedback if your incident data suggests different numbers.

Worked example — how MTTD-for-Agents frames the EchoLeak exploit

EchoLeak’s zero-click prompt-injection on Microsoft 365 Copilot can exfiltrate enterprise data through the agent surface in seconds to minutes. Against that attack profile, no detection chain meets the 4-hour target — the data is already gone before the first SRE-standard alert would fire.

MTTD-for-Agents still applies, but the tripwire semantics shift. What fires is not “Copilot was exploited” — the exploitation is invisible to traditional monitoring. What fires is unusual outbound data volume to domains the user did not initiate, measured at the tool-call boundary. That signal is upstream of the exfiltration target, so it produces a detectable tripwire even when the downstream action completes in seconds.

The operational implication for 2026: instrumenting MTTD-for-Agents at the agent surface produces detection value against EchoLeak-class incidents even though the exploit completes faster than any human response window. The detection doesn’t stop the first exfiltration. It stops the second one. And it makes the forensic timeline dramatically tighter, which matters for incident reporting under GDPR, NIS2, and the EU AI Act.

This is what makes MTTD-for-Agents a governance metric, not just a security metric. It’s the layer that feeds evidence back into the GAUGE threat-model dimension — the dimension asking vendors to prove they’ve thought about the agent-specific attack surface, not the general app-security one. It’s also what regulatory incident-reporting regimes like GDPR Article 33 (72-hour breach notification), NIS2 Article 23 (early warning within 24 hours, incident notification within 72), and EU AI Act Article 73 (serious-incident reporting within 15 days) implicitly require — you can’t report what you didn’t detect.

How to implement MTTD-for-Agents

Order matters. Running the chain without a baseline produces noise. Instrumenting tripwires without detection routing produces nothing.

  1. Instrument the agent-surface telemetry first. Tool-call boundary logs with enough metadata to compute the four tripwire metrics. This is almost always where the delay is — not because the instrumentation is hard, but because nobody owns “log every tool call with output-length and delegation metadata” as a line item. Assign it.
  2. Run 14 days of baselining in production. No tripwires during baseline — just accumulate the four-dimensional distribution per agent.
  3. Set v1 tripwire thresholds at the canonical values ( tool-use, |Z|>2.5 output length, delegation, −2σ refusal). Expect false-positive rates of 2–5% in the first week. Tighten thresholds as you learn which signals are real.
  4. Define the detection chain in the runbook. Alert routing, triage SLA, containment playbook, postmortem template. The containment options have to be pre-approved — a tripwire firing at 3 AM is not the time to negotiate with legal about rate-limiting a customer-facing agent.
  5. Measure MTTD weekly, not per-incident. Average detection time across tripwires-that-produced-real-incidents. Track the rolling 4-week average. Target values are moving targets — the point is that the number is falling quarter-over-quarter, not that any single incident hit the 4-hour line.

Download: MTTD-for-Agents tripwire starter kit

The Excel download is a working-document version of the framework. It includes:

  • Instructions — what MTTD-for-Agents is, how the four tripwires work, what the targets mean
  • Current-state calculator — input your last 12 months of detected agent incidents, compute your existing MTTD and gap-to-target
  • Tripwire config template — per-agent configuration spreadsheet with the four canonical thresholds, exportable as a starter YAML for monitoring integration
  • Detection-chain playbook — 5-phase runbook template with SLA columns and containment-action pre-approval checkboxes
  • Target-values reference — the 4-hour and 24-hour numbers with the rationale and re-review cadence

The download is free. Signing up for the monthly newsletter is the delivery mechanism — the newsletter sends the file within minutes and then sends one email a month with newly-archived claims, verdict changes on existing claims, GAUGE Index updates, and MTTD research notes.

Download the MTTD-for-Agents tripwire starter kit →

Corrections

MTTD-for-Agents is on a 12-month review cadence. The thresholds, the target values, and the detection-chain SLAs are all expected to move as the enterprise-agent threat surface evolves — the Q1 2026 exploit data already argues for tighter thresholds on cross-agent delegation than the v1 level.

Methodology changes are dated and logged in the public record. If a threshold feels wrong, or a target is unreasonable for a named deployment context, submit feedback. Corrections that land change the public methodology; corrections that don’t land get a public response explaining why — the exchange follows the same Claim Archive methodology the rest of this publication runs on.

The intent, narrow and explicit: enterprise security teams need a metric they can actually measure, defend, and improve. Academic precision that nobody can operationalise is not the goal.

Vigil · reviewed