Documentation

DORA Developer Documentation

Learn how to classify common infrastructure failures under the Digital Operational Resilience Act. Look up production errors to see their typical risk profiles, root causes, and remediation steps.

PG-503Database

PostgreSQL 503 Connection Timeout

Application tier cannot acquire connections from the PostgreSQL pool, returning HTTP 503 to upstream callers.

Typical: MAJORAnalyze
S3-403Cloud Storage

AWS S3 403 Access Denied

Service receives 403 Forbidden from S3 when reading or writing objects required for transaction processing.

Typical: DependsAnalyze
K8S-CLBOOrchestration

Kubernetes CrashLoopBackOff

Pods repeatedly crash on startup and Kubernetes backs off restarts, draining capacity for a workload.

Typical: MINORAnalyze
REDIS-OOMCache

Redis OOM Command Not Allowed

Redis rejects writes with OOM command not allowed when used memory has exceeded maxmemory.

Typical: DependsAnalyze
SWIFT-GWPayments

SWIFT Gateway Connection Timeout

The financial entity's SWIFT Alliance gateway cannot establish or maintain a session with the SWIFT network, halting cross-border payment message transmission.

Typical: MAJORAnalyze
TLS-EXPPKI / Security

TLS Certificate Expiry — Service Unavailable

An expired TLS certificate causes browsers and API clients to reject connections with a certificate validation error, rendering the affected endpoint completely unreachable.

Typical: MAJORAnalyze
OAUTH-503Identity

OAuth Provider 503 — Authentication Unavailable

The external or internal OAuth 2.0 / OIDC provider returns 503 Service Unavailable, preventing all token issuance and causing every authenticated endpoint to reject users.

Typical: MAJORAnalyze
DB-REPDatabase

Database Replication Lag — Read Replica Stale

The read replica falls significantly behind the primary, causing stale reads for balance enquiries, transaction history, and risk calculations that depend on near-real-time data.

Typical: DependsAnalyze
PAY-GWPayments

Third-Party Payment Gateway Outage

External payment processor (e.g. Stripe, Adyen, Worldline) returns 5xx or is completely unreachable, blocking all card-present and card-not-present transactions for the financial entity's customers.

Typical: MAJORAnalyze
CLOUD-AZCloud Infrastructure

Cloud Provider Availability Zone (AZ) Failure

A single cloud provider Availability Zone (e.g. AWS eu-central-1a, Azure westeurope-1) becomes unavailable, taking down all single-AZ services, databases, and stateful workloads within the affected zone.

Typical: MAJORAnalyze
RANS-ADCybersecurity

Ransomware / Active Directory Compromise

Ransomware has encrypted critical systems or threat actors have gained control of the Active Directory / Identity Provider, causing widespread authentication failure and data inaccessibility across the organisation.

Typical: MAJORAnalyze
DB-DROPDatabase

Accidental Database Drop / Failed Migration Script

A bad migration script or manual error drops a production table, schema, or entire database, causing immediate and total loss of data availability for the affected service.

Typical: MAJORAnalyze
DDOS-APICybersecurity

DDoS Attack on Public API Gateway

A volumetric or application-layer DDoS attack saturates the public API gateway, causing HTTP 503/504 responses for legitimate clients and blocking access to all customer-facing services.

Typical: MAJORAnalyze
BATCH-EODCore Banking

End-of-Day (EOD) Batch Processing Failure

The nightly end-of-day batch run fails to complete, leaving account balances, transaction postings, and regulatory position reports in an inconsistent or stale state as of the next business morning.

Typical: MAJORAnalyze
KAFKA-LAGMessaging

Kafka Consumer Lag / Event Backlog

A Kafka consumer group falls critically behind the producer rate, causing stale data in downstream read models, delayed notifications, or stalled asynchronous transaction processing.

Typical: DependsAnalyze
NET-PARTInfrastructure

Network Partition / Split-Brain

A network link between data centers or availability zones fails, causing clustered databases or stateful systems to fragment and potentially elect multiple primary nodes.

Typical: MAJORAnalyze
BGP-LEAKNetworking

BGP Route Leak / Traffic Hijack

An autonomous system (AS) incorrectly advertises routes to your infrastructure, causing global internet traffic to be misrouted or dropped before reaching your edge.

Typical: MAJORAnalyze