Master-Slave Replication Setup for Web Applications
Replication solves two problems: high availability (failover) and read scaling. We examine PostgreSQL streaming replication — the standard and reliable approach.
Architecture
App servers
│
├── writes ──▶ Primary (master)
│ │
│ WAL stream
│ │
└── reads ──▶ Replica 1 (hot standby)
Replica 2 (hot standby)
Hot standby — replica accepts SELECT queries. Cold standby — only for failover, no reads.
Primary Configuration
postgresql.conf on the main server:
# Replication
wal_level = replica # minimum for streaming replication
max_wal_senders = 5 # max concurrent replication connections
wal_keep_size = 1GB # how much WAL to keep on disk for lagging replica
max_replication_slots = 5 # physical replication slots
# Performance
synchronous_commit = on # on = synchronous WAL write, off = faster, risk losing 1 transaction
pg_hba.conf on primary — allow replication user connection:
# TYPE DATABASE USER ADDRESS METHOD
host replication replicator 10.0.1.0/24 scram-sha-256
Create replication user:
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'strong_password';
Replica Configuration (PostgreSQL 12+)
On replica start pg_basebackup for initial sync:
# On replica server, PostgreSQL stopped
pg_basebackup \
-h 10.0.1.10 \ # primary IP
-U replicator \
-D /var/lib/postgresql/14/main \
-P \ # progress
-Xs \ # include WAL stream
-R # create standby.signal and write primary_conninfo
Flag -R creates standby.signal and adds to postgresql.auto.conf:
primary_conninfo = 'host=10.0.1.10 port=5432 user=replicator password=strong_password application_name=replica1'
primary_slot_name = 'replica1_slot'
postgresql.conf on replica:
hot_standby = on # accept SELECT on replica
hot_standby_feedback = on # replica reports its transactions to primary (prevents vacuum race)
max_standby_streaming_delay = 30s
Replication Slots
Slot guarantees primary won't delete WAL segments before replica receives them:
-- On primary
SELECT pg_create_physical_replication_slot('replica1_slot');
-- Check status
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_bytes
FROM pg_replication_slots;
Risk of slots: if replica disconnects for long — primary accumulates WAL files. Control it:
# postgresql.conf on primary
max_slot_wal_keep_size = 10GB # PostgreSQL 13+: max WAL for slot
Monitoring Replication Lag
-- On primary: status of all replicas
SELECT
application_name,
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS replay_lag_bytes,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
-- On replica: current lag
SELECT
now() - pg_last_xact_replay_timestamp() AS replication_lag_seconds,
pg_is_in_recovery() AS is_replica;
Application Configuration
Route queries at application level: write → primary, read → replica.
For Node.js + Sequelize:
const sequelize = new Sequelize({
dialect: 'postgres',
replication: {
read: [
{ host: '10.0.1.11', username: 'app', password: process.env.DB_PASS, database: 'mydb' },
{ host: '10.0.1.12', username: 'app', password: process.env.DB_PASS, database: 'mydb' },
],
write: {
host: '10.0.1.10', username: 'app', password: process.env.DB_PASS, database: 'mydb',
},
},
pool: { max: 10, idle: 10000 },
});
For Python (SQLAlchemy) — via custom routing or sqlalchemy-rwconn library.
PgBouncer as Proxy
Recommended to place PgBouncer before each DB server:
App → PgBouncer (primary:6432) → PostgreSQL Primary :5432
App → PgBouncer (replica:6432) → PostgreSQL Replica :5432
PgBouncer reduces real PostgreSQL connections — important with hundreds of application workers.
Automatic Failover
Patroni is the standard for automatic PostgreSQL failover in production:
# /etc/patroni/config.yml (fragment)
scope: postgres-cluster
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.10:8008
etcd:
hosts: 10.0.1.20:2379,10.0.1.21:2379,10.0.1.22:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576 # 1MB
pg_hba:
- host replication replicator 10.0.1.0/24 scram-sha-256
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.1.10:5432
data_dir: /var/lib/postgresql/14/main
parameters:
max_connections: 200
wal_level: replica
max_wal_senders: 5
Patroni uses etcd (or Consul, ZooKeeper) as distributed lock. On primary failure — automatically promotes replica with minimal lag.
Timelines
Manual streaming replication setup (primary + 1–2 replicas, no failover): 1 day. Installation and setup of Patroni with etcd and HAProxy for automatic failover: 2–3 days.







