DORA Developer Documentation
Learn how to classify common infrastructure failures under the Digital Operational Resilience Act. Look up production errors to see their typical risk profiles, root causes, and remediation steps.
PostgreSQL 503 Connection Timeout
Application tier cannot acquire connections from the PostgreSQL pool, returning HTTP 503 to upstream callers.
AWS S3 403 Access Denied
Service receives 403 Forbidden from S3 when reading or writing objects required for transaction processing.
Kubernetes CrashLoopBackOff
Pods repeatedly crash on startup and Kubernetes backs off restarts, draining capacity for a workload.
Redis OOM Command Not Allowed
Redis rejects writes with OOM command not allowed when used memory has exceeded maxmemory.
SWIFT Gateway Connection Timeout
The financial entity's SWIFT Alliance gateway cannot establish or maintain a session with the SWIFT network, halting cross-border payment message transmission.
TLS Certificate Expiry — Service Unavailable
An expired TLS certificate causes browsers and API clients to reject connections with a certificate validation error, rendering the affected endpoint completely unreachable.
OAuth Provider 503 — Authentication Unavailable
The external or internal OAuth 2.0 / OIDC provider returns 503 Service Unavailable, preventing all token issuance and causing every authenticated endpoint to reject users.
Database Replication Lag — Read Replica Stale
The read replica falls significantly behind the primary, causing stale reads for balance enquiries, transaction history, and risk calculations that depend on near-real-time data.
Third-Party Payment Gateway Outage
External payment processor (e.g. Stripe, Adyen, Worldline) returns 5xx or is completely unreachable, blocking all card-present and card-not-present transactions for the financial entity's customers.
Cloud Provider Availability Zone (AZ) Failure
A single cloud provider Availability Zone (e.g. AWS eu-central-1a, Azure westeurope-1) becomes unavailable, taking down all single-AZ services, databases, and stateful workloads within the affected zone.
Ransomware / Active Directory Compromise
Ransomware has encrypted critical systems or threat actors have gained control of the Active Directory / Identity Provider, causing widespread authentication failure and data inaccessibility across the organisation.
Accidental Database Drop / Failed Migration Script
A bad migration script or manual error drops a production table, schema, or entire database, causing immediate and total loss of data availability for the affected service.
DDoS Attack on Public API Gateway
A volumetric or application-layer DDoS attack saturates the public API gateway, causing HTTP 503/504 responses for legitimate clients and blocking access to all customer-facing services.
End-of-Day (EOD) Batch Processing Failure
The nightly end-of-day batch run fails to complete, leaving account balances, transaction postings, and regulatory position reports in an inconsistent or stale state as of the next business morning.
Kafka Consumer Lag / Event Backlog
A Kafka consumer group falls critically behind the producer rate, causing stale data in downstream read models, delayed notifications, or stalled asynchronous transaction processing.
Network Partition / Split-Brain
A network link between data centers or availability zones fails, causing clustered databases or stateful systems to fragment and potentially elect multiple primary nodes.
BGP Route Leak / Traffic Hijack
An autonomous system (AS) incorrectly advertises routes to your infrastructure, causing global internet traffic to be misrouted or dropped before reaching your edge.