Root Cause & Remediation
Long-running write transactions on the primary blocking WAL application on the replica, network saturation on the replication link, replica I/O bottleneck, or a schema migration that created replication conflicts.
Remediation steps
- 1Monitor pg_stat_replication (PostgreSQL) or SHOW SLAVE STATUS (MySQL) for lag metrics.
- 2Identify and kill long-running write transactions blocking WAL replay.
- 3Temporarily route all traffic to the primary if replica lag exceeds your business SLA.
- 4Check replica I/O and network capacity; consider promoting a new replica from a recent snapshot.
- 5Review migration scripts for operations that serialise replication (e.g. ALTER TABLE on large tables).