DB-REPDatabase

Database Replication Lag — Read Replica Stale

The read replica falls significantly behind the primary, causing stale reads for balance enquiries, transaction history, and risk calculations that depend on near-real-time data.

Root Cause & Remediation

Long-running write transactions on the primary blocking WAL application on the replica, network saturation on the replication link, replica I/O bottleneck, or a schema migration that created replication conflicts.

Remediation steps

1Monitor pg_stat_replication (PostgreSQL) or SHOW SLAVE STATUS (MySQL) for lag metrics.
2Identify and kill long-running write transactions blocking WAL replay.
3Temporarily route all traffic to the primary if replica lag exceeds your business SLA.
4Check replica I/O and network capacity; consider promoting a new replica from a recent snapshot.
5Review migration scripts for operations that serialise replication (e.g. ALTER TABLE on large tables).

DORA Risk Matrix

Typical classification

Context-dependent

Likelihood: Medium
Blast radius: Read-heavy workloads (reporting, balances, KYC checks) degrade; writes are unaffected.
CIF impact: Account balance display, transaction history, and fraud screening may show stale data.
Analyst notes: Classification depends heavily on the lag magnitude and the functions affected. Stale fraud scoring data during high-volume trading can constitute a CIF impact and escalate to MAJOR.

Ready to classify this incident?

Use the DoraPulse Triage Calculator to instantly determine if this event breaches DORA materiality thresholds and generate a ready-to-file regulatory draft for your internal compliance team.

Open Triage Calculator — Pre-filled for Database Replication Lag — Read Replica Stale