Root Cause & Remediation
Saturated connection pool (PgBouncer or application pool) combined with long-running queries or replica failover. Often triggered by missing statement timeouts, runaway analytics queries, or a primary failover storm.
Remediation steps
- 1Inspect pg_stat_activity for long-running queries and terminate offenders.
- 2Increase PgBouncer pool size or move to transaction pooling mode.
- 3Add statement_timeout and idle_in_transaction_session_timeout at the role level.
- 4Verify replica promotion completed and DNS/HAProxy points to the new primary.
- 5Add circuit breakers in the application layer to fail fast and shed load.