PostgreSQL Failover Replica PG 17 Deep Dive

Most teams I work with still treat PostgreSQL failover the way they did in 2019: a runbook, a pager, and a quiet hope that the logical replication subscribers will pick themselves back up. That is no longer a reasonable posture. PostgreSQL 17 — released September 2024 and now the default major version across most managed clouds in 2026 — quietly rewrote the parts of failover that used to hurt the most. Failover slots survive promotion. pg_createsubscriber can turn a physical replica into a logical one without re-snapshotting. Streaming logical replication during DDL is no longer a footgun. Synchronous commit modes finally have ergonomics that match the documentation.

I want to walk through what actually changed, what is still the same, and what production failover looks like when the database underneath is PG 17 rather than PG 12, 14, or 16. This is meant for platform engineers and DBAs who already know what a WAL record is — I am not going to explain replication from first principles, but I am going to be specific about the patterns that survive 3 AM.

PostgreSQL failover topology

The shape of the problem before PG 17#

Before we get to the new mechanics, it helps to remember why failover used to be miserable.

You had a primary, one or more streaming physical replicas (usually one synchronous, the rest asynchronous), and probably a logical replication topology fan-out to analytics or to a downstream service. The primary died. You promoted a replica. So far, fine — pg_ctl promote is straightforward and recovery.signal plus standby.signal were already understood by anyone who had read the docs twice.

The pain came after.

Your logical replication publishers and slots only existed on the dead primary. Subscribers downstream had been streaming from a slot that no longer had a home. The newly-promoted replica had no slot at all because slots were never replicated. So you either re-created the slot on the new primary and lost whatever WAL had accumulated downstream, or you restored from a logical snapshot taken before the failover and re-subscribed, or — most commonly — you ran a full re-snapshot to the downstream service and waited for hours while it caught up. None of these are graceful.

Beyond slots, the second problem was lag. An asynchronous replica is by definition behind the primary by some unknown amount of WAL. If the primary died between fsyncs of a transaction that mattered — payments, inventory decrements, the kind of write where loss is visible — then promotion meant accepting that data loss. The mitigations were either synchronous_commit = on with a synchronous_standby_names policy, which trades latency for durability, or a write-ahead log archive that you could replay forward on the replica before promoting it. The first is operationally heavy, the second is even heavier.

The third problem was leader election. PostgreSQL has no native consensus layer. If the primary’s network partitioned but its disk was still healthy, you risked a split-brain — two writeable databases, the same logical primary IP, and a bad time reconciling diverging state. Patroni, repmgr, pg_auto_failover, and Stolon existed for exactly this reason: they wrap PostgreSQL in a consensus store (etcd, Consul, ZooKeeper, or Kubernetes) and refuse to promote without quorum.

Now: PG 17.

What PG 17 actually changed#

Three changes matter for failover, in order of impact.

Failover slots — synchronization of logical replication slots to standbys. This is the headline feature for anyone running logical replication. In PG 17, you can now mark a logical replication slot as failover = true at creation time, and the slot’s state — confirmed_flush_lsn, restart_lsn, the catalog snapshot — is synchronized to physical replicas. When you promote a replica, the slot exists on the new primary, advanced to roughly the same position. Subscribers reconnect to the new primary and resume streaming with at most a few seconds of redundant catch-up.

The mechanism uses the new pg_sync_replication_slots() function on the standby, which is called periodically by a new background worker, sync replication slots worker, configured via sync_replication_slots = on on the standby and hot_standby_feedback = on so the primary knows not to vacuum tuples the slot still needs. There is a real cost — the standby holds back vacuum on the primary the same way the slot itself would — but for any production logical replication topology, this is the difference between a graceful failover and a multi-hour rebuild.

-- On the primary, create a failover-aware slot
SELECT pg_create_logical_replication_slot(
  'analytics_sub_1',
  'pgoutput',
  false,    -- temporary
  false,    -- twophase
  true      -- failover
);

-- On the standby, enable synchronization
ALTER SYSTEM SET sync_replication_slots = on;
ALTER SYSTEM SET hot_standby_feedback = on;
SELECT pg_reload_conf();

After promotion, the slot is already present on the new primary. The subscriber simply reconnects to the new IP.

pg_createsubscriber — convert a physical replica into a logical one. This is the second large change. Previously, if you had a fan-out topology — primary streaming physically to a replica, replica streaming logically to a downstream warehouse — and the primary died, you not only lost the slot, you lost the logical publisher’s identity. pg_createsubscriber, a new utility in PG 17, lets you take an existing physical streaming replica and convert it in place into a logical subscriber of the new primary, reusing the existing data files. No re-snapshot, no rebuild — the replica continues serving its downstream subscribers from a stable position.

The use case is narrow but powerful: if your reporting database is a physical replica that you only need to deviate from production for a few queries, you can graduate it to a logical subscriber the moment that deviation becomes load-bearing, without ever pausing the data flow.

Streaming logical replication during transactions, including DDL changes. PG 14 introduced streaming of in-progress transactions to subscribers. PG 16 added two-phase commit support. PG 17 closes the last large gap: logical replication of certain DDL operations (specifically, CREATE/DROP/ALTER SEQUENCE and the sequence’s state) and significantly improved decoding throughput for large transactions. For teams doing zero-downtime major-version upgrades — primary on PG 15, set up a PG 17 logical subscriber, switch traffic — this is the difference between a multi-day exercise and a weekend one. The logical-replication patterns post on this site has a longer walk-through.

Synchronous commit modes — what to actually pick#

Let me be specific because this is the setting that determines whether failover loses data or not, and it is the setting most teams either ignore or over-tune.

synchronous_commit has five real values: off, local, remote_write, on (which is an alias for remote_flush in newer versions), and remote_apply. From weakest to strongest:

off — commit returns when the WAL is in memory. Crashes lose data. Acceptable for analytical staging, ETL ingest, anything you can recompute. Never for OLTP.
local — commit returns when the WAL is fsynced to local disk. Single-machine durability. Acceptable if you accept that primary loss = some data loss.
remote_write — commit returns when the WAL is in the standby’s memory. Survives primary crash but not simultaneous primary + standby crash. Reasonable middle ground.
on / remote_flush — commit returns when the WAL is fsynced on the standby. Survives primary loss with zero data loss as long as the standby is healthy. This is the right default for any system where data loss is visible — payments, ledgers, anything with an audit obligation.
remote_apply — commit returns when the standby has applied the WAL to its own table state. Used for read-after-write consistency on the replica. Highest latency.

The recommendation for almost every production OLTP workload in 2026 is synchronous_commit = on with synchronous_standby_names = 'ANY 1 (replica_a, replica_b)'. The ANY 1 quorum is the part most teams miss — it means commit succeeds as long as one of the listed standbys acknowledges, so a single replica’s network blip does not pause writes on the primary. This is supported back to PG 10 but is still under-used.

Leader election: still your job#

PostgreSQL still has no native leader election. PG 17 did not change this. Your options remain:

Patroni with etcd, Consul, ZooKeeper, or Kubernetes. The most widely-deployed solution in 2026 and the one I recommend for any team running PostgreSQL on bare VMs or in a self-hosted Kubernetes. Patroni handles primary detection, leader election, automatic failover, and integrates cleanly with HAProxy or PgBouncer for connection routing.
repmgr from EnterpriseDB. Simpler, less opinionated, but does not integrate with a distributed consensus store. Reasonable for two-node setups with manual intervention; not what you want for multi-region.
pg_auto_failover from Microsoft. Two-node monitor-driven topology. Compact and operationally simple, but the monitor is itself a single point of failure unless you run it HA.
Stolon from Sorint. Less actively maintained than Patroni but still in use. Similar architectural approach.
Crunchy Postgres Operator (CPO) on Kubernetes. Wraps Patroni inside a Kubernetes operator with backups, monitoring, and TLS automation. The default for teams running PG on Kubernetes in 2026.
CloudNativePG — increasingly the preferred Kubernetes operator for new deployments, especially in the EU, because of its CNCF-incubated status. Native support for PG 17 features including failover slots.
Managed services — AWS RDS, Aurora, Cloud SQL, Azure Database for PostgreSQL Flexible Server, and the various Crunchy/Aiven/Timescale managed offerings handle this entirely. The tradeoff is reduced operational visibility and slower adoption of new PG features (Aurora’s PG 17 support landed in early 2026, well after self-hosted teams had it).

If you are running PG 17 on Kubernetes today, the answer is almost certainly CloudNativePG or CPO. If you are running on VMs, the answer is Patroni. If you are on managed, the failover policy is whatever the vendor implements.

A concrete production runbook#

This is the runbook I use for self-hosted PG 17 with Patroni and one synchronous + two asynchronous replicas. Adjust to your topology.

Detection. Patroni’s leader lease in etcd expires after ttl seconds (default 30, I use 15 for tighter recovery time). The standby with the highest LSN bids for leadership.

Pre-promotion checks. Patroni runs pg_rewind-style checks against the current primary if it is reachable, to avoid promoting a replica whose data diverges. With synchronous_commit = on and ANY 1, the synchronous standby is always at or ahead of any acknowledged write, so it is the canonical promotion target.

Promotion. Patroni issues pg_ctl promote on the elected standby. Failover slots are already present courtesy of sync_replication_slots. The standby becomes the new primary.

Connection rerouting. HAProxy health checks see the new primary’s pg_is_in_recovery() = false and shift writes. PgBouncer (if present) gets reloaded to point at the new primary. Application connection pools should be using DNS or service discovery so this happens automatically; if they pin to an IP, you have a separate problem.

Subscriber reconnection. Logical replication subscribers (the warehouse, the analytics service, the downstream API) reconnect to the new primary using the synchronized failover slot. There may be a few seconds of duplicate row delivery due to slot position rounding — your subscriber should be idempotent.

Old primary recovery. When the old primary returns, Patroni reinitializes it as a streaming replica using pg_rewind. With PG 17’s improved pg_rewind performance — it now reads WAL incrementally rather than reading every changed block — this completes in minutes rather than hours for large databases.

Post-failover audit. A SELECT pg_current_wal_lsn(); on the new primary and the same on every subscriber tells you exactly how much divergence there was. If the synchronous standby was healthy at the time of failure, the divergence is zero.

The full sequence on a healthy 4-node cluster takes under 30 seconds from primary-loss-detection to fully-routed-writes. That is achievable in 2026; it was not in 2019.

What to verify in your environment this week#

If you are running PG 14, 15, or 16 today, the upgrade-to-17 conversation is worth having now. The failover-slot feature alone is worth the work. Specifically:

Audit your logical replication topology. Every slot that exists today should be re-created with failover = true after the upgrade.
Set sync_replication_slots = on on every standby. Confirm hot_standby_feedback = on and accept the modest vacuum-bloat trade.
Re-read your synchronous_standby_names. If it is empty or set to *, change it. Use ANY 1 (a, b) or FIRST 1 (a, b) depending on which standby you want to be canonical.
Run a failover drill. Promote a standby in a non-production environment. Watch the subscribers reconnect. Time the full sequence. Write down what you actually saw, not what the runbook says.
Decide what pg_createsubscriber enables for you. If you have a physical replica that you wish was logical, this utility shortens a multi-day exercise to an afternoon.

What is still missing#

Honest accounting: PG 17 did not fix everything.

There is still no native consensus. Failover slots survive the primary, but they do not survive a failover that wipes the entire cluster — you still need a real backup-and-restore plan, ideally with pgBackRest or Barman, and you still need to test restores monthly because that is the only way to know they work. Logical replication still cannot replicate large objects (the lo_ API), sequences had their state replicated in PG 17 but the ergonomics are not perfect, and pg_partman-style partition management still requires careful DDL coordination across publishers.

The next two PostgreSQL releases — PG 18, in beta as of 2026-04 with general availability targeted for autumn 2026, and the early PG 19 roadmap — promise built-in TDE (transparent data encryption), better incremental backups, and further logical replication improvements. The trajectory is good, but PG 17 is the version where failover stops being miserable.

Where pdpspectra fits#

We build production PostgreSQL platforms for finance, healthcare, and SaaS clients across our four offices — Boston, London, Sydney, and Kathmandu — and PG 17 migrations have been most of the database conversation for the last six months. If you have a logical replication topology that you have been quietly afraid of, or a primary you cannot bring yourself to upgrade because the failover plan is fragile, our data engineering team does this work end-to-end: assessment, upgrade plan, dry-run failover, rollback plan, and the on-call coverage to do the cutover.

Related reading on this site: the logical replication patterns post, the dbt advanced patterns post for downstream warehouse considerations, and the data stack as operational engine post for how this fits into the broader platform story.

Failover should be boring. PG 17 finally makes it so. Tell us about your PostgreSQL environment and we will help you get there.

The shape of the problem before PG 17#

What PG 17 actually changed#

Synchronous commit modes — what to actually pick#

Leader election: still your job#

A concrete production runbook#

What to verify in your environment this week#

What is still missing#

Where pdpspectra fits#

Related posts.

Modern Postgres in 2026: Beyond OLTP — pgvector, FDW, Logical Replication

Schema Migration at Scale: Atlas vs Flyway vs Liquibase

Amazon's $13B India Bet Is a Data-Residency Story