Zero-downtime schema migration
-
It means changing your database schema without taking the application offline and without breaking running requests.
-
This becomes important in real systems because:
- multiple app servers may be running different versions during deployment
- old and new code can hit DB at same time
- large tables can lock for long time if altered carelessly
-
A safe migration is usually done in phases, not one big change.
Example: Adding a New Columnβ
you have:
users(id, name)
and you want:
users(id, full_name)
instead of name.
Bad approach:
ALTER TABLE users DROP COLUMN name;
ALTER TABLE users ADD COLUMN full_name;
This breaks old code immediately.
Instead use an expand β migrate β contract strategy.
Phase 1: Expand Schema (Backward Compatible)β
Add new column first:
ALTER TABLE users ADD COLUMN full_name TEXT NULL;
Nothing breaks because old code still works.
Now schema is:
users(id, name, full_name)
Phase 2: Dual Writeβ
Update application:
Old:
INSERT INTO users(name)
New:
INSERT INTO users(name, full_name)
Whenever app writes:
- write both columns
Example:
await db.query(`
INSERT INTO users(name, full_name)
VALUES($1, $1)
`);
Now:
- old servers read/write
name - new servers can use
full_name
Both versions coexist safely.
Phase 3: Backfill Existing Dataβ
Existing rows still have:
full_name = NULL
So run background migration:
UPDATE users
SET full_name = name
WHERE full_name IS NULL;
For huge tables:
- do batch updates
- avoid locking millions of rows
Example:
UPDATE users
SET full_name = name
WHERE id BETWEEN 1 AND 10000;
then next batch.
Phase 4: Switch Readsβ
Application now reads from full_name.
Maybe temporarily:
const displayName = row.full_name ?? row.name;
This makes rollback safe too.
Phase 5: Contract Schemaβ
Once:
- all app servers use new code
- backfill completed
- no old readers/writers exist
remove old column:
ALTER TABLE users DROP COLUMN name;
Done without downtime.
Core Principleβ
Every migration step should be:
- backward compatible
- forward compatible
Meaning:
- old app version works with new schema
- new app version works with old schema
Common Zero-Downtime Patternsβ
1. Renaming Columnβ
Databases often lock or break dependencies.
Instead:
- create new column
- dual write
- backfill
- switch reads
- delete old column
2. Changing Data Typeβ
Example:
price VARCHAR -> BIGINT
Do:
- add new column
price_v2 - dual write
- backfill converted data
- switch reads
- drop old column
3. Splitting Tableβ
Old:
users(id, name, address)
New:
users(id, name)
addresses(user_id, address)
Strategy:
- create new table
- dual write both places
- backfill
- switch reads
- remove old column
Why This Is Hard in Distributed Systemsβ
During deployment:
- some pods may run old code
- some pods run new code
For a few minutes both versions coexist.
If migration is not compatible:
- requests fail
- serialization errors happen
- null issues happen
- app crashes
Large Table Migration Challengesβ
For huge tables:
ALTER TABLEcan lock writes- index creation may block
- backfill can overload DB
Solutions:
- batch migration
- online index creation
- throttled workers
- shadow tables
Example in PostgreSQL:
CREATE INDEX CONCURRENTLY idx_users_email
ON users(email);
This avoids blocking writes.
Blue-Green / Rolling Deployment Connectionβ
Zero-downtime migrations are tightly connected with deployment strategies.
During rolling deploy:
- old servers alive
- new servers start gradually
So DB changes must support both.
Thatβs why destructive schema changes are delayed until the end.