Root Cause & Remediation
Fiber cut, faulty BGP configuration, firewall misconfiguration, or severe packet loss on a cross-region VPC peering connection.
Remediation steps
- 1Verify connectivity between nodes using ping, traceroute, and checking firewall logs.
- 2Identify the quorum state of critical databases (e.g., etcd, ZooKeeper, Consul, PostgreSQL).
- 3If a split-brain occurred, stop traffic to the minority partition to prevent conflicting writes.
- 4Failover the entire workload to the healthy, majority data center.
- 5Perform a careful reconciliation of any writes accepted by the minority partition before it fenced.