Root Cause & Remediation
Cloud provider hardware failure, power event, or network partition within one physical AZ. Services not architected for multi-AZ failover are fully impacted. Stateful databases using single-AZ deployments are the primary casualty.
Remediation steps
- 1Verify the outage scope against the cloud provider's Health Dashboard and Status API.
- 2Trigger your multi-AZ or multi-region failover runbook for affected services.
- 3Promote read replicas in healthy AZs to primary and update DNS/load balancer targets.
- 4Communicate an ETA to the business; begin DORA triage immediately — cross-border financial impact is likely.
- 5Post-incident: enforce IaC policies that prevent single-AZ deployments for CIF-adjacent workloads.