Why 14 fields specifically?

Each field corresponds to an evidence element the regulator can reasonably ask for in an Article 73 incident inquiry or an Article 9 risk-management review. The 14 fields are: deployment ID, agent identity, session ID, ISO timestamp, user prompt, retrieved context with provenance, model output, planned action, action class, approval reference, executed action, tool-call audit chain, output disclosure surface, and policy version. Removing any field produces a structural gap: an inquiry about the action chain that produced an outcome cannot be answered without the tool-call audit chain field; an inquiry about whether the action was authorised cannot be answered without the approval reference field. The 14 fields are the minimum for completeness, not the maximum. Many enterprises will add fields specific to sector overlays (HIPAA patient identifier, SOX transaction reference, GDPR data subject ID).

How long do logs need to be retained?

Article 19 sets the floor at 6 months. Sector-specific overlays typically extend this materially. HIPAA: 6 years from creation. SOX: 7 years for material financial controls. GDPR Article 30 records: as long as the processing activity continues plus the relevant statute-of-limitations period. ESG reporting: 10 years for the underlying decision evidence. The applicable retention is the maximum of all relevant requirements. An enterprise running a single deployment that touches healthcare, finance, and EU operations should plan for 7-year retention as the operating floor and may need 10 in specific contexts.

What does 'queryable' mean in practice?

Queryable means the regulator's reasonable request can be answered without manual log archaeology. Specifically: filter by deployment ID, filter by date range, filter by user identity (with appropriate access controls), filter by action class. Aggregate by agent identity to characterise the deployment-level decision pattern. Export the filtered result in a format the regulator will accept (typically structured JSON or CSV with a schema document). The 4-business-hour evidence-assembly target is the practical bar. An enterprise unable to answer 'show me every action this deployment took on date X' within 4 hours has a structurally inadequate audit substrate.

Does the vendor's logging satisfy this on its own?

Partially. Microsoft, Anthropic, OpenAI, and Google all have audit logging in their enterprise agent platforms as of April 2026. Coverage varies by field: most platforms capture deployment ID, agent identity, session ID, timestamp, and user prompt natively; coverage of context provenance, planned vs executed action, and approval reference is uneven. The provenance field specifically (which retrieved content, with which trust level, contributed to a given output) is the most-commonly-incomplete in vendor-native logging. An enterprise relying solely on vendor-native logging will typically discover gaps during the first regulator inquiry. The right posture is to instrument the gap-fields in the deployment layer.

How does this connect to incident reporting under Article 73?

Article 73 requires providers to report serious incidents to the national competent authority within specific timelines (immediately upon awareness for the most severe categories, within 15 days for other categories). The Article 12 logs are the primary evidence base for the Article 73 report. An enterprise with a complete 14-field log can produce the Article 73 report inputs from the log query alone. An enterprise with incomplete logs produces partial reports, which can themselves constitute non-conformity findings. The integrated incident-response template covering Article 73 plus NIS2 plus GDPR Article 33 is in the EU AI Act preparation track at /eu-ai-act-agentic-ai-compliance/, claim AM-035.

EU AI Act Article 12 audit-evidence template for agentic AI

At a glance

Claim

EU AI Act Article 12 (record-keeping for high-risk AI systems) and Article 19 (record retention by providers) are operationalised for agentic AI by a 14-field audit-evidence template that captures every agent decision in a regulator-queryable form: deployment ID, agent identity, session ID, ISO timestamp, user prompt, retrieved context with provenance, model output, planned action, action class, approval reference, executed action, tool-call audit chain, output disclosure surface, and policy version. Logs retained for the regulatory minimum (typically 6 months for the EU AI Act baseline, 5 to 7 years for sector-specific overlays like HIPAA and SOX) in a queryable format that supports under-4-business-hour evidence assembly. An enterprise that captures the 14 fields, retains them for the maximum applicable period, and instruments the queryable export has substantially completed Article 12 compliance for the agent layer; the residual work is integrating the agent log stream with the broader audit substrate.

Supporting figure

14-field log structure satisfies Article 12 record-keeping for agentic AI in production

Date

26 Apr 2026

Verdict

Holding(AM-046)

Next review

25 Jun 2026(+60d)

EU AI Act Article 12 is short. The operational consequence is large.

The article requires high-risk AI systems to “have technical capabilities allowing the automatic recording of events (‘logs’) over the lifetime of the system.” The recording must be sufficient to ensure traceability of the system’s functioning, support post-market monitoring obligations under Article 72, support investigations of incidents under Article 73, and enable the operation of the risk-management system under Article 9. Article 19 sets the retention floor at 6 months, with longer periods where Member State or sector-specific law applies.

What the article does not specify is what the logs should contain. That specification is the deploying enterprise’s responsibility. What follows is a template that operationalises Article 12 for agentic AI deployments specifically.

The 14 fields

Each field corresponds to an evidence element a regulator can reasonably ask for in an Article 73 incident inquiry or an Article 9 risk-management review. The 14 fields are the minimum for completeness; an enterprise running deployments in regulated sectors will add fields for the relevant overlays.

1. Deployment ID

A stable, opaque identifier for the agent deployment. Deployments may share an underlying model, share a vendor platform, share a tenancy, or share a user population; the deployment ID distinguishes the regulator-relevant unit. The deployment ID is the primary key for filtering Article 12 logs in any inquiry.

2. Agent identity

The non-human IAM identity executing the action (Q1 of the readiness diagnostic). Distinguishes agent actions from human actions in the audit log. Without this field, agent actions appear under the human owner’s identity, which produces irreconcilable evidence in regulator inquiries.

3. Session ID

The agent’s working session boundary. A session typically corresponds to a user-initiated interaction or a scheduled task execution. Within a session, the agent may take many actions; the session ID groups them for traceability of the decision chain.

4. ISO timestamp

UTC timestamp in ISO 8601 format with millisecond precision. Time-zone normalisation matters in cross-jurisdiction inquiries; enterprise audit systems that log local timestamps without time-zone metadata produce ambiguity that surfaces during the inquiry.

5. User prompt

The user’s input or scheduled task instruction that initiated the agent’s reasoning. Captured verbatim. PII redaction in the storage layer is acceptable if the redaction is reversible by an authorised principal; irreversible redaction prevents Article 73 inquiries that depend on understanding what the user actually asked.

6. Retrieved context with provenance

Each piece of content the agent retrieved or referenced during its reasoning, tagged with: source identifier (which document, which email, which tool response), source trust level (high-trust internal, medium-trust SaaS, low-trust external), and retrieval timestamp. The provenance field is the most commonly incomplete in vendor-native logging and the most regulator-critical for cross-agent prompt-injection (EchoLeak-class) inquiries.

7. Model output

The model’s complete output, including any chain-of-thought or planning artifacts the platform exposes. Output captured before any tool execution. Where the model output is structured (JSON tool-call format, function-calling output), the structured form is captured as-is rather than rendered.

8. Planned action

The action the agent intended to take, as derived from the model output. Distinguished from the executed action because the gap between planning and execution is itself audit-relevant: an action was planned but blocked by a guardrail, an action was modified by an approval workflow, an action failed to execute and the agent retried.

9. Action class

The classification of the planned action against the deployment’s published action-class taxonomy. Standard classes: read, write, financial, production-data, communication-external, delegation-to-other-agent. Action class determines the approval-gate behaviour (Q4 of the readiness diagnostic) and the retention requirement (high-impact classes typically have longer retention).

10. Approval reference

If the action class required human approval, the reference to the approval event: approver identity (named human, not a service account), approval timestamp, approval scope (this action only, this session, or a broader policy). For actions in approval-required classes that proceed without an approval reference, the absence is itself a finding.

11. Executed action

What actually happened. The action as executed, including any modifications applied by the deployment layer (rate limiting, parameter sanitisation, scope reduction). Distinguished from the planned action so the inquiry can answer “did the agent’s plan match what happened” without log archaeology.

12. Tool-call audit chain

The full chain of tool invocations the agent made, with: tool identifier, parameters, response, response timestamp, and any cascading tool calls triggered by the response. For multi-step reasoning, the chain shows the full decision tree the agent traversed. The tool-call audit chain is the primary evidence base for unauthorised-action investigations.

13. Output disclosure surface

Where the action’s output went: response back to the user, write to a connected system, communication to an external counterparty, persistence in the agent’s memory. The disclosure surface field is the primary evidence base for exfiltration investigations: an EchoLeak-class incident is identified by the disclosure-surface field showing an external destination not in the user’s expected scope.

14. Policy version

The version reference of the deployment’s operating policy at the time of the action. Policies change; the version stamp ensures the inquiry can determine which policy was in effect when the action was taken. Without the version stamp, an action that was compliant at the time can appear non-compliant against a later policy version.

Retention

The Article 19 floor is 6 months. Sector overlays typically extend this materially. A deployment touching multiple regulated sectors needs to plan for the maximum of the applicable retention periods.

Regulation	Retention floor	Notes
EU AI Act Article 19	6 months	Provider obligation; Member State law may extend
HIPAA	6 years from creation	Healthcare; covers the agent decision and the underlying data
SOX	7 years	Financial deployments material to internal controls over reporting
GDPR Article 30	Duration of processing + statute of limitations	Personal data processing records
MiFID II	5-7 years	Investment services
FERPA	Variable, typically 5+ years	Education-sector deployments
ESG reporting (e.g., CSRD)	10 years	Decision evidence underlying public disclosures

The operating retention floor for an enterprise running deployments across these sectors is 7 years; specific contexts may push to 10. Storage cost has dropped enough that retention is rarely the cost driver; the cost driver is keeping the logs queryable across the retention period, which is an indexing and tooling problem rather than a storage problem.

Queryability

The 4-business-hour evidence-assembly target is the practical bar for Article 12 readiness. The target derives from Article 73’s incident-reporting timelines (immediate to 15 days depending on category) and from the operational reality that legal review of the regulator response consumes most of the elapsed time once the evidence is assembled.

The queries the audit substrate needs to answer in the 4-hour window:

“Show me every action deployment X took during a 24-hour window.”
“Show me the full decision chain for action Y, including the input that produced it and the approval reference.”
“Show me every action that involved retrieved content from source Z.”
“Show me the policy version that was in effect when action W was taken.”
“Show me the tool-call audit chain for session V.”
“Show me every action whose disclosure surface routed to destination U.”

An audit substrate that cannot answer these queries within the 4-hour window has a structural gap. The gap is typically not in storage; it is in indexing or in the join between deployment-layer logs and platform-layer logs. Closing it requires building the integration that lets the queries run against a single, deployment-level view.

Vendor coverage as of April 2026

Vendor-native logging coverage of the 14 fields is uneven. The following table summarises the field-by-field state for the major enterprise agent platforms based on publicly documented platform capabilities:

Field	Microsoft 365 Copilot	Anthropic Managed Agents	OpenAI Enterprise	Google Gemini
Deployment ID	native	native	native	native
Agent identity	native (post-Entra integration)	native	native	native
Session ID	native	native	native	native
ISO timestamp	native	native	native	native
User prompt	native	native	native	native
Retrieved context with provenance	partial (post-EchoLeak hardening)	partial	partial	partial
Model output	native	native	native	native
Planned action	partial	native	partial	partial
Action class	deployment-layer	deployment-layer	deployment-layer	deployment-layer
Approval reference	partial	partial	partial	partial
Executed action	native	native	native	native
Tool-call audit chain	native	native	native	native
Output disclosure surface	partial	partial	partial	partial
Policy version	deployment-layer	deployment-layer	deployment-layer	deployment-layer

The pattern: vendors cover roughly 8 to 10 of the 14 fields natively, with the gaps concentrated in provenance tracking, planned-vs-executed distinction, output-disclosure-surface explicitness, and policy versioning. The deployment-layer instrumentation work to close the gaps is moderate (typically 3 to 6 weeks for an enterprise with a mature observability platform) but not optional. An enterprise relying solely on vendor-native logging will discover gaps during the first regulator inquiry.

Where this template does NOT cover

The template addresses Article 12 record-keeping for agent decisions and actions. It does not address:

Article 14 (human oversight): the meta-level evidence that human oversight is operationally meaningful, not just the per-action approval references. Covered by separate documentation requirements.
Article 15 (accuracy, robustness, cybersecurity): the system-level cybersecurity posture including the cross-agent prompt-injection class. The Article 15 documentation references the audit substrate but is structurally separate.
Article 17 (quality-management system): the organisational-process documentation that the audit substrate is part of but does not constitute.
Article 73 (incident reporting): the inquiry workflow that consumes the audit substrate but adds its own documentation requirements (incident classification, timeline, mitigation steps, communication with the competent authority).

The Article 12 template is the foundation. The other articles build on top of it. An enterprise with a strong Article 12 posture has the substrate that makes the other articles tractable; an enterprise with a weak Article 12 posture finds every other article harder.

The full EU AI Act preparation framework that integrates Article 12 with the other articles is at /eu-ai-act-agentic-ai-compliance/ (claim AM-035). The seven-control surface this template is part of is in the OWASP Agentic AI Top 10 enterprise walkthrough (claim AM-043). The procurement playbook that operationalises Article 12 evidence assembly during procurement is at /enterprise-agentic-ai-procurement-playbook/ (claim AM-041).

The 14 fields are the minimum. The 4-hour assembly target is the practical bar. The 14-week runway to 2 August 2026 is the time available. An enterprise that specifies, instruments, and drill-tests the audit substrate in that window is operationally ready for Article 12. An enterprise that does not is exposed in the most-likely-asked dimension of the EU AI Act enforcement programme.

ShareX / Twitter LinkedIn Email

Spotted an error? See corrections policy →

Disagree with this piece?

Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.

Part of the pillar

Agentic AI governance →

Governance frameworks, oversight patterns, and compliance postures for enterprise agentic-AI deployment. 26 other pieces in this pillar.