Skip to content
This piece was written by Claude (Anthropic). Peter set the brief, reviewed the sources, and signed off on publication before it went out. Why we work this way →
AM-032pub24 Apr 2026rev24 Apr 2026read11 min
Risk & Governance

Agentic AI in financial services: five frameworks

Financial services sit at the intersection of DORA, NIS2, MiFID II, EU AI Act, and GDPR. Agentic AI inherits every obligation. The sector playbook.

Holding·reviewed24 Apr 2026·next+59d
Cover for financial-services agentic AI piece. Five stacked regulatory frameworks (DORA, NIS2, MiFID II, EU AI Act, GDPR) on the left, connected by lines to the six GAUGE dimensions on the right. Footer reads: Five frameworks. One deployment. One audit trail.
Cover for financial-services agentic AI piece. Five stacked regulatory frameworks (DORA, NIS2, MiFID II, EU AI Act, GDPR) on the left, connected by lines to the six GAUGE dimensions on the right. Footer reads: Five frameworks. One deployment. One audit trail.

Financial services is the enterprise sector where agentic AI deployment economics collide with the most compound regulatory surface in the European Union. The general-purpose frameworks (EU AI Act, GDPR) apply. The cross-sector critical-infrastructure frameworks (NIS2) apply. The financial-services-specific frameworks (DORA on ICT operational resilience, MiFID II on conduct of business for investment services) apply on top. Five frameworks, one deployment, one audit trail. The governance map is not “AI governance” in the generic sense; it is a sector-specific overlay where compliance-posture becomes the load-bearing GAUGE dimension.

This piece maps which frameworks bite where, names the three specific dimensions where financial-services deployments score materially lower on the GAUGE framework than cross-industry averages, and walks the pilot-to-production transition for regulated contexts. It is the first in a planned vertical series; comparable pieces for healthcare (HIPAA + HITECH + EU AI Act high-risk provisions), public sector (FedRAMP + EU AI Act Annex III), and retail/logistics follow in the next quarters.

Two propositions structure the piece:

  • The compounded framework surface is the dominant operational constraint. Not model capability (CMU 30.3% data), not ROI availability (Stanford DEL 12% bucket), but the number of concurrent obligations that have to be satisfied by the same artefact set. Financial-services deployments that succeed are organised around that constraint, not despite it.
  • Liability does not transfer to the vendor, regardless of contract. MiFID II conduct rules, EU AI Act deployer obligations, and DORA third-party-risk provisions explicitly place the customer-facing and regulator-facing liability on the deploying financial institution. This is the single most-mistaken assumption in 2026 financial-services procurement, and the one that produces the most expensive governance gaps.

The five frameworks, specifically

Each of the five frameworks places obligations that the general-purpose NIST AI RMF or ISO/IEC 42001 governance scaffold does not cover by default.

DORA (Regulation 2022/2554) — operational resilience. In force since January 2025 for EU financial entities. Requires ICT third-party risk management where an agentic AI vendor can be classified as a critical ICT third-party service provider, triggering concentration-risk analysis, enhanced contractual provisions, and supervisory oversight. Requires threat-led penetration testing covering agent attack surfaces. Requires incident notification within specific windows (initial within 4 hours of classification, intermediate within 72 hours, final within one month). None of these are in general AI governance frameworks.

NIS2 Directive — critical-infrastructure incident reporting. Applies to financial entities classified as essential or important under the directive. Early-warning notification within 24 hours of significant incident awareness; incident notification within 72 hours; final report within one month. Overlaps with DORA for financial entities; the higher bar of the two applies. Article 23 detail here.

MiFID II — conduct of business. Applies to investment-services activities specifically. Imposes suitability and appropriateness assessments on advice and order execution, conflict-of-interest management, and record-keeping obligations on every interaction that could inform an investment decision. When an agent participates in any step of the value chain (pre-trade analysis, client communication, order routing, post-trade reporting), MiFID II obligations attach to that step.

EU AI Act — general-purpose and high-risk AI. Agentic AI systems used for credit scoring, creditworthiness assessment, insurance-underwriting, or employment-related decisions in financial services are classified as high-risk under Annex III. High-risk classification triggers conformity assessment, risk-management system, data-governance, record-keeping, transparency, human-oversight, and accuracy-robustness-cybersecurity obligations. Article 26 places specific obligations on deployers of high-risk systems that do not transfer to the system provider via contract.

GDPR — personal-data processing. Applies on top of the others whenever the agent touches customer personal data. Article 22 specifically limits decisions based solely on automated processing; in financial-services contexts with retail clients, the automated-processing threshold gets reached quickly. Article 33 adds a 72-hour breach notification on top of DORA and NIS2 notification obligations.

The five frameworks have overlapping but non-identical incident-notification timelines, which is itself a governance-infrastructure problem. DORA 4-hour initial + 72-hour intermediate + 1-month final. NIS2 24-hour early warning + 72-hour incident notification + 1-month final. GDPR 72-hour breach notification. MiFID II conduct-breach reporting follows national competent authority timelines. A single material incident can trigger all five notifications at once, with different thresholds, different recipients, different evidence requirements.

Where financial-services deployments score lower on GAUGE

Running the GAUGE diagnostic against 2025-2026 financial-services agentic AI deployments, three dimensions score materially lower than cross-industry averages:

Compliance posture. Cross-industry average lands in the 55-65 band (functional). Financial-services deployments average 40-50 on the first-pass score, because the compounded framework surface means the evidence map has to cover more obligations and most deployments address DORA, NIS2, MiFID II, EU AI Act, and GDPR partially rather than fully. The compliance-posture weight (15% of GAUGE total) does not capture how much heavier this dimension is in regulated sectors; the dimension weights are fixed for v1 of the methodology and re-reviewed at Q2 2027.

Vendor lock-in. Cross-industry average in the 50-60 band. Financial-services average 35-45, because DORA’s ICT-third-party-risk framework requires exit-strategy documentation and tested data portability that most vendors do not support out of the box. Financial institutions typically carry a larger vendor-lock-in reserve (per the CFO business case template) than cross-industry peers, reflecting the real cost of DORA-compliant vendor switching.

Governance maturity. Cross-industry average in the 55-65 band. Financial-services average 50-60, slightly lower because the approval workflow has to integrate with existing three-lines-of-defence structures (business, risk, internal audit) that pre-date agentic AI. The integration work is tractable but often deferred, which produces partial governance maturity on the specific axis of agent-deployment approvals.

Three dimensions lower, three approximately even (threat model, ROI evidence, change management). The net effect is GAUGE totals typically 15-25 points below cross-industry averages on first pass, before any remediation. That gap is closable but concentrates most of the first-year remediation work in compliance-posture and vendor-lock-in.

The liability question, specifically

MiFID II Article 24 places conduct-of-business obligations on the investment firm. EU AI Act Article 26 places monitoring, logging, and human-oversight obligations on the deployer of high-risk AI. DORA Article 28 places ICT third-party risk-management obligations on the financial entity. None of these obligations are contractually transferable to the agentic AI platform vendor.

In practice, vendor contracts in 2026 typically include two kinds of provisions that look like liability transfer but do not actually transfer regulatory liability:

Service-level indemnification for platform failures. If the vendor’s platform has an outage or a documented bug that causes an incident, the vendor indemnifies the customer for specific costs. This is normal SaaS contract language and does not transfer regulatory liability; it allocates internal financial responsibility.

Professional-liability insurance covering agent decisions. Some vendors carry E&O insurance that covers specific decision-quality claims. This is a commercial-risk instrument, not a regulatory-liability transfer; the regulator still enforces obligations against the deployer.

The practical implication for any financial-services enterprise deploying agentic AI: the business case should assume the full regulatory liability stays with the institution, and the vendor contract is about allocating internal financial responsibility between institution and vendor for specific failure modes. Two separate questions, often conflated in procurement.

This is also why the enterprise agentic AI RFP includes specific questions about what the vendor commits to contractually on DORA and MiFID II evidence production, and why the compliance-posture score of the vendor’s platform matters as much as the deploying institution’s own score. A vendor that cannot produce the evidence the regulator requires forces the deploying institution to build that evidence layer externally, which is a significant cost driver that does not show up in vendor-supplied TCO.

The pilot-to-production transition in regulated contexts

For financial-services enterprises specifically, the McKinsey 23% scaling-gap analysis has three additional failure modes beyond the cross-industry preconditions:

Failure mode 1. ICT third-party criticality threshold. A pilot below a specific vendor-dependency threshold does not trigger DORA’s critical third-party provisions. Scaling the same agent to cover more business units or customer cohorts crosses the threshold, triggering concentration-risk analysis, enhanced contractual terms, and potentially direct supervisory oversight of the vendor. Many pilots scale into this threshold without noticing until a supervisor notices for them.

Failure mode 2. Audit-trail volume. Pilot-scale audit trails can often be handled by ad-hoc logging infrastructure. Scaled-production trails (millions of agent actions per day across customer-facing and internal workflows) exceed what ad-hoc tooling handles in terms of query performance, retention, and regulator-ready evidence extraction. Most regulated pilots scale with inadequate audit-trail infrastructure and discover the gap at first material-incident review, which is the worst possible time.

Failure mode 3. Incident-reporting-pathway first-test. The notification windows (4 hours DORA initial, 24 hours NIS2 early warning, 72 hours multiple frameworks) are tight enough that the first-incident-at-scale often breaches one or more of them. The pilot never exercised the pathway end-to-end because no material incident occurred at pilot scale. Tabletop exercises run during pilot are the cheap intervention; most financial-services pilots skip them on the argument that “there’s nothing to exercise yet,” which is circular.

Each of the three failure modes is specifically tied to the transition from pilot (below-threshold) to scaled (above-threshold). None is about model capability. Each is tractable with 90-day preparation before the scale-up decision.

The practical sequence for a financial-services deployment

Ordered for enterprises in the 39% experimenting cohort considering scaling a financial-services agentic AI pilot:

  1. Run the GAUGE diagnostic with per-framework evidence map. Compliance-posture scoring for financial services requires evidence lines per framework (DORA, NIS2, MiFID II, EU AI Act, GDPR). A functional-band score here is a 60+; below 40 indicates material exposure and the scale-up is not ready.
  2. Test the incident-reporting pathway end-to-end at pilot scale via tabletop exercise. Trigger a synthetic incident; walk through notification to each relevant authority within its window; document the exercise outcome. If the tabletop fails, fix the pathway before scaling.
  3. Classify the scaled deployment against DORA critical ICT third-party criteria. If the vendor crosses criticality at scale, execute the enhanced-contract provisions before the scale-up, not as a remediation after.
  4. Instrument audit-trail infrastructure for scaled volume. Query performance for regulator-ready evidence extraction at projected scaled volume must be tested, not assumed. Same for retention duration; DORA and GDPR both specify minimums.
  5. Instrument MTTD-for-Agents with financial-services tripwires. In addition to the standard tripwires, add sector-specific ones: unusual transaction-size distribution, client-category-distribution drift, compliance-flag rate anomalies. The detection layer in regulated sectors is a precondition for meeting the 4-hour DORA initial-notification window on genuine incidents.

The sequence compresses to 90-120 days for a well-governed pilot; 6-9 months for a pilot that needs substantial remediation first. Both timelines are shorter than the alternative, which is scaling into material incidents and remediating under regulatory scrutiny.

What the regulator looks at, specifically

Supervisory review of agentic AI deployments in EU financial services in 2026 has converged on a small set of evidence artefacts:

  • A complete agent registry covering every deployed agent, with owner, scope, risk classification, and approval-workflow record.
  • A per-deployment risk-management system aligned to EU AI Act Article 9 for high-risk classifications.
  • Audit-trail logs retained per DORA and GDPR minimums, with regulator-ready extraction tested.
  • Incident-reporting-pathway exercised at least quarterly, documented.
  • Third-party-risk management file per DORA Article 28, including the agentic AI vendor’s classification and concentration analysis.
  • Human-oversight policy and evidence per EU AI Act Article 14 for high-risk classifications.
  • Record of compliance-posture self-assessment, updated annually, covering all applicable frameworks.

The artefact list is defensive but specific. The GAUGE framework produces or maps to every item on the list. Deploying an agentic AI system in regulated financial services without these artefacts produced in advance of supervisory review is the recurring failure pattern the Gartner 40% cancellation projection lands on; regulated-sector deployments cancelled under supervisory pressure are a measurable share of that projection.

Holding-up note

The primary claim of this piece (that EU financial-services agentic AI deployments operate under a compounded five-framework obligation surface that sits on top of general AI governance, that liability does not transfer to the vendor contractually, and that the GAUGE-framework compliance-posture and vendor-lock-in dimensions are the dominant score drivers for the sector) is on a 60-day review cadence. Three kinds of evidence would move the verdict:

  • A major European Supervisory Authority (EBA, ESMA, EIOPA) publishing specific agentic AI guidance that materially changes how the five frameworks apply or their relative weight. Would force a recalibration of the dimension-emphasis framing.
  • A DORA or EU AI Act enforcement action against a financial institution that defines liability-transfer boundaries differently than currently understood. Would update the liability-question analysis.
  • Published vendor contract terms (via industry-body templates or major-customer disclosures) that meaningfully close the DORA third-party-risk gap without requiring substantial customer-side evidence work. Would reduce the vendor-lock-in dimension’s drag on financial-services GAUGE scores.

If any land, the Holding-up record for AM-032 captures what changed, dated. Original claim stays visible. Nothing is quietly removed.

ShareX / TwitterLinkedInEmail

Spotted an error? See corrections policy →

Disagree with this piece?

Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.

Part of the pillar

Agentic AI governance

Governance frameworks, oversight patterns, and compliance postures for enterprise agentic-AI deployment. 9 other pieces in this pillar.

Related reading

Vigil · reviewed