EU AI Act Article 12 audit-evidence template for agentic AI
A 14-field audit-evidence template that operationalises EU AI Act Article 12 record-keeping requirements for agentic AI deployments. Captures every agent decision in regulator-queryable form. Designed for under-4-business-hour evidence assembly.
Holding·reviewed26 Apr 2026·next+60dEU AI Act Article 12 is short. The operational consequence is large.
The article requires high-risk AI systems to “have technical capabilities allowing the automatic recording of events (‘logs’) over the lifetime of the system.” The recording must be sufficient to ensure traceability of the system’s functioning, support post-market monitoring obligations under Article 72, support investigations of incidents under Article 73, and enable the operation of the risk-management system under Article 9. Article 19 sets the retention floor at 6 months, with longer periods where Member State or sector-specific law applies.
What the article does not specify is what the logs should contain. That specification is the deploying enterprise’s responsibility. What follows is a template that operationalises Article 12 for agentic AI deployments specifically.
The 14 fields
Each field corresponds to an evidence element a regulator can reasonably ask for in an Article 73 incident inquiry or an Article 9 risk-management review. The 14 fields are the minimum for completeness; an enterprise running deployments in regulated sectors will add fields for the relevant overlays.
1. Deployment ID
A stable, opaque identifier for the agent deployment. Deployments may share an underlying model, share a vendor platform, share a tenancy, or share a user population; the deployment ID distinguishes the regulator-relevant unit. The deployment ID is the primary key for filtering Article 12 logs in any inquiry.
2. Agent identity
The non-human IAM identity executing the action (Q1 of the readiness diagnostic). Distinguishes agent actions from human actions in the audit log. Without this field, agent actions appear under the human owner’s identity, which produces irreconcilable evidence in regulator inquiries.
3. Session ID
The agent’s working session boundary. A session typically corresponds to a user-initiated interaction or a scheduled task execution. Within a session, the agent may take many actions; the session ID groups them for traceability of the decision chain.
4. ISO timestamp
UTC timestamp in ISO 8601 format with millisecond precision. Time-zone normalisation matters in cross-jurisdiction inquiries; enterprise audit systems that log local timestamps without time-zone metadata produce ambiguity that surfaces during the inquiry.
5. User prompt
The user’s input or scheduled task instruction that initiated the agent’s reasoning. Captured verbatim. PII redaction in the storage layer is acceptable if the redaction is reversible by an authorised principal; irreversible redaction prevents Article 73 inquiries that depend on understanding what the user actually asked.
6. Retrieved context with provenance
Each piece of content the agent retrieved or referenced during its reasoning, tagged with: source identifier (which document, which email, which tool response), source trust level (high-trust internal, medium-trust SaaS, low-trust external), and retrieval timestamp. The provenance field is the most commonly incomplete in vendor-native logging and the most regulator-critical for cross-agent prompt-injection (EchoLeak-class) inquiries.
7. Model output
The model’s complete output, including any chain-of-thought or planning artifacts the platform exposes. Output captured before any tool execution. Where the model output is structured (JSON tool-call format, function-calling output), the structured form is captured as-is rather than rendered.
8. Planned action
The action the agent intended to take, as derived from the model output. Distinguished from the executed action because the gap between planning and execution is itself audit-relevant: an action was planned but blocked by a guardrail, an action was modified by an approval workflow, an action failed to execute and the agent retried.
9. Action class
The classification of the planned action against the deployment’s published action-class taxonomy. Standard classes: read, write, financial, production-data, communication-external, delegation-to-other-agent. Action class determines the approval-gate behaviour (Q4 of the readiness diagnostic) and the retention requirement (high-impact classes typically have longer retention).
10. Approval reference
If the action class required human approval, the reference to the approval event: approver identity (named human, not a service account), approval timestamp, approval scope (this action only, this session, or a broader policy). For actions in approval-required classes that proceed without an approval reference, the absence is itself a finding.
11. Executed action
What actually happened. The action as executed, including any modifications applied by the deployment layer (rate limiting, parameter sanitisation, scope reduction). Distinguished from the planned action so the inquiry can answer “did the agent’s plan match what happened” without log archaeology.
12. Tool-call audit chain
The full chain of tool invocations the agent made, with: tool identifier, parameters, response, response timestamp, and any cascading tool calls triggered by the response. For multi-step reasoning, the chain shows the full decision tree the agent traversed. The tool-call audit chain is the primary evidence base for unauthorised-action investigations.
13. Output disclosure surface
Where the action’s output went: response back to the user, write to a connected system, communication to an external counterparty, persistence in the agent’s memory. The disclosure surface field is the primary evidence base for exfiltration investigations: an EchoLeak-class incident is identified by the disclosure-surface field showing an external destination not in the user’s expected scope.
14. Policy version
The version reference of the deployment’s operating policy at the time of the action. Policies change; the version stamp ensures the inquiry can determine which policy was in effect when the action was taken. Without the version stamp, an action that was compliant at the time can appear non-compliant against a later policy version.
Retention
The Article 19 floor is 6 months. Sector overlays typically extend this materially. A deployment touching multiple regulated sectors needs to plan for the maximum of the applicable retention periods.
| Regulation | Retention floor | Notes |
|---|---|---|
| EU AI Act Article 19 | 6 months | Provider obligation; Member State law may extend |
| HIPAA | 6 years from creation | Healthcare; covers the agent decision and the underlying data |
| SOX | 7 years | Financial deployments material to internal controls over reporting |
| GDPR Article 30 | Duration of processing + statute of limitations | Personal data processing records |
| MiFID II | 5-7 years | Investment services |
| FERPA | Variable, typically 5+ years | Education-sector deployments |
| ESG reporting (e.g., CSRD) | 10 years | Decision evidence underlying public disclosures |
The operating retention floor for an enterprise running deployments across these sectors is 7 years; specific contexts may push to 10. Storage cost has dropped enough that retention is rarely the cost driver; the cost driver is keeping the logs queryable across the retention period, which is an indexing and tooling problem rather than a storage problem.
Queryability
The 4-business-hour evidence-assembly target is the practical bar for Article 12 readiness. The target derives from Article 73’s incident-reporting timelines (immediate to 15 days depending on category) and from the operational reality that legal review of the regulator response consumes most of the elapsed time once the evidence is assembled.
The queries the audit substrate needs to answer in the 4-hour window:
- “Show me every action deployment X took during a 24-hour window.”
- “Show me the full decision chain for action Y, including the input that produced it and the approval reference.”
- “Show me every action that involved retrieved content from source Z.”
- “Show me the policy version that was in effect when action W was taken.”
- “Show me the tool-call audit chain for session V.”
- “Show me every action whose disclosure surface routed to destination U.”
An audit substrate that cannot answer these queries within the 4-hour window has a structural gap. The gap is typically not in storage; it is in indexing or in the join between deployment-layer logs and platform-layer logs. Closing it requires building the integration that lets the queries run against a single, deployment-level view.
Vendor coverage as of April 2026
Vendor-native logging coverage of the 14 fields is uneven. The following table summarises the field-by-field state for the major enterprise agent platforms based on publicly documented platform capabilities:
| Field | Microsoft 365 Copilot | Anthropic Managed Agents | OpenAI Enterprise | Google Gemini |
|---|---|---|---|---|
| Deployment ID | native | native | native | native |
| Agent identity | native (post-Entra integration) | native | native | native |
| Session ID | native | native | native | native |
| ISO timestamp | native | native | native | native |
| User prompt | native | native | native | native |
| Retrieved context with provenance | partial (post-EchoLeak hardening) | partial | partial | partial |
| Model output | native | native | native | native |
| Planned action | partial | native | partial | partial |
| Action class | deployment-layer | deployment-layer | deployment-layer | deployment-layer |
| Approval reference | partial | partial | partial | partial |
| Executed action | native | native | native | native |
| Tool-call audit chain | native | native | native | native |
| Output disclosure surface | partial | partial | partial | partial |
| Policy version | deployment-layer | deployment-layer | deployment-layer | deployment-layer |
The pattern: vendors cover roughly 8 to 10 of the 14 fields natively, with the gaps concentrated in provenance tracking, planned-vs-executed distinction, output-disclosure-surface explicitness, and policy versioning. The deployment-layer instrumentation work to close the gaps is moderate (typically 3 to 6 weeks for an enterprise with a mature observability platform) but not optional. An enterprise relying solely on vendor-native logging will discover gaps during the first regulator inquiry.
Where this template does NOT cover
The template addresses Article 12 record-keeping for agent decisions and actions. It does not address:
- Article 14 (human oversight): the meta-level evidence that human oversight is operationally meaningful, not just the per-action approval references. Covered by separate documentation requirements.
- Article 15 (accuracy, robustness, cybersecurity): the system-level cybersecurity posture including the cross-agent prompt-injection class. The Article 15 documentation references the audit substrate but is structurally separate.
- Article 17 (quality-management system): the organisational-process documentation that the audit substrate is part of but does not constitute.
- Article 73 (incident reporting): the inquiry workflow that consumes the audit substrate but adds its own documentation requirements (incident classification, timeline, mitigation steps, communication with the competent authority).
The Article 12 template is the foundation. The other articles build on top of it. An enterprise with a strong Article 12 posture has the substrate that makes the other articles tractable; an enterprise with a weak Article 12 posture finds every other article harder.
The full EU AI Act preparation framework that integrates Article 12 with the other articles is at /eu-ai-act-agentic-ai-compliance/ (claim AM-035). The seven-control surface this template is part of is in the OWASP Agentic AI Top 10 enterprise walkthrough (claim AM-043). The procurement playbook that operationalises Article 12 evidence assembly during procurement is at /enterprise-agentic-ai-procurement-playbook/ (claim AM-041).
The 14 fields are the minimum. The 4-hour assembly target is the practical bar. The 14-week runway to 2 August 2026 is the time available. An enterprise that specifies, instruments, and drill-tests the audit substrate in that window is operationally ready for Article 12. An enterprise that does not is exposed in the most-likely-asked dimension of the EU AI Act enforcement programme.
Spotted an error? See corrections policy →
Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.
Agentic AI governance →
Governance frameworks, oversight patterns, and compliance postures for enterprise agentic-AI deployment. 26 other pieces in this pillar.