Implementing Zero-Downtime Database Migration Between Versions
Zero-downtime database migration is upgrading to new PostgreSQL, MySQL or other DBMS version without stopping application. Requires special approach to schema, data transfer and traffic switching.
Zero-Downtime Migration Principles
Any schema change goes through backward-compatible stages:
- Add new (column, table) — application ignores new
- Deploy code that writes to both places
- Migrate existing data in batches
- Deploy code that reads only from new
- Delete old
No DROP COLUMN and RENAME COLUMN in production in one step.
PostgreSQL Major Version Update (pg_upgrade)
Method 1: pg_upgrade with replica
# 1. Install new PostgreSQL version alongside
apt install postgresql-15
# 2. Stop writes (brief downtime for prep)
pg_ctl -D /var/lib/postgresql/14/main stop
# 3. pg_upgrade in --link mode (no file copying)
/usr/lib/postgresql/15/bin/pg_upgrade \
--old-datadir=/var/lib/postgresql/14/main \
--new-datadir=/var/lib/postgresql/15/main \
--old-bindir=/usr/lib/postgresql/14/bin \
--new-bindir=/usr/lib/postgresql/15/bin \
--link \
--check # check without executing first
# 4. Execute upgrade
/usr/lib/postgresql/15/bin/pg_upgrade \
--old-datadir=/var/lib/postgresql/14/main \
--new-datadir=/var/lib/postgresql/15/main \
--old-bindir=/usr/lib/postgresql/14/bin \
--new-bindir=/usr/lib/postgresql/15/bin \
--link
--link mode uses hardlinks instead of copying — for 100GB database takes seconds instead of hours. Downside: can't restart old version after.
Method 2: Logical Replication (true zero-downtime)
-- On old server (PG 13)
CREATE PUBLICATION migration_pub FOR ALL TABLES;
-- On new server (PG 15) — create same schema
pg_dump -s -U postgres myapp | psql -U postgres -h new-server myapp
-- Subscribe to replication
CREATE SUBSCRIPTION migration_sub
CONNECTION 'host=old-server dbname=myapp user=replication password=pass'
PUBLICATION migration_pub;
-- Monitor initial sync progress
SELECT subname, received_lsn, latest_end_lsn
FROM pg_stat_subscription;
After synchronization:
-- Check lag (should be near zero)
SELECT now() - last_msg_receipt_time AS subscription_lag
FROM pg_stat_subscription;
-- Switch: stop writes to old DB, wait for lag = 0
-- Update connection string in app
-- Drop subscription
DROP SUBSCRIPTION migration_sub;
Schema Migrations Without Downtime
Adding NOT NULL column
Can't do in one step — ALTER TABLE locks table during DEFAULT computation.
Correct approach:
-- Step 1: add column with DEFAULT (PostgreSQL 11+ — instant)
ALTER TABLE users ADD COLUMN phone VARCHAR(20) DEFAULT NULL;
-- Step 2: fill data in batches
DO $$
DECLARE
batch_size INT := 1000;
offset_val INT := 0;
BEGIN
LOOP
UPDATE users SET phone = '' WHERE id IN (
SELECT id FROM users WHERE phone IS NULL ORDER BY id LIMIT batch_size
);
EXIT WHEN NOT FOUND;
PERFORM pg_sleep(0.01); -- pause between batches
END LOOP;
END $$;
-- Step 3: add NOT NULL constraint (fast if no NULLs)
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
Renaming Column
-- Step 1: add new column
ALTER TABLE orders ADD COLUMN customer_id BIGINT;
-- Step 2: fill data (+ trigger for new records)
CREATE OR REPLACE FUNCTION sync_customer_id() RETURNS TRIGGER AS $$
BEGIN
NEW.customer_id := NEW.user_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER sync_customer_id_trigger
BEFORE INSERT OR UPDATE ON orders
FOR EACH ROW EXECUTE FUNCTION sync_customer_id();
-- Batch-fill existing records
UPDATE orders SET customer_id = user_id WHERE customer_id IS NULL;
-- Step 3: deploy code reading customer_id
-- Step 4: drop old column and trigger
ALTER TABLE orders DROP COLUMN user_id;
DROP TRIGGER sync_customer_id_trigger ON orders;
Tools
gh-ost (GitHub) — online schema migration for MySQL without locks:
gh-ost \
--host=db-master \
--database=myapp \
--table=users \
--alter="ADD INDEX idx_email (email)" \
--execute
pg-osc — equivalent for PostgreSQL.
Flyway / Liquibase — migration version management with undo support.
Testing Migration Plan
# Restore production dump to staging
pg_restore -U postgres -d myapp_staging production.dump
# Test migration plan
psql -U postgres myapp_staging < migration_plan.sql
# Measure execution time
\timing on
\i migration_plan.sql
Execution Time
Zero-downtime PostgreSQL upgrade with logical replication — 2–3 days. Complex schema migration with multiple steps — 1–2 weeks (including testing and staging).







