mentation
1. Migration Structure
Migrations must be idempotent and versioned. We use TypeScript with pg for execution, ensuring type safety and integration with modern CI/CD pipelines.
2. Phase 1: Expand Migration
Add the new column and index. Do not drop the old column.
// migrations/001_add_full_name_expand.ts
import { Pool } from 'pg';
export async function up(pool: Pool): Promise<void> {
const client = await pool.connect();
try {
await client.query('BEGIN');
// Add column with NULL initially to avoid table rewrite
await client.query(`
ALTER TABLE users
ADD COLUMN IF NOT EXISTS full_name TEXT;
`);
// Create index CONCURRENTLY to avoid locking writes
await client.query(`
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_full_name
ON users (full_name);
`);
await client.query('COMMIT');
} catch (e) {
await client.query('ROLLBACK');
throw e;
} finally {
client.release();
}
}
export async function down(pool: Pool): Promise<void> {
// Safe rollback: drop index and column
const client = await pool.connect();
try {
await client.query('DROP INDEX CONCURRENTLY IF EXISTS idx_users_full_name');
await client.query('ALTER TABLE users DROP COLUMN IF EXISTS full_name');
} finally {
client.release();
}
}
3. Phase 2: Dual-Write Application Logic
The application must handle both schemas during the transition. Use feature flags to control the rollout.
// src/services/userService.ts
import { db } from './db';
interface UserUpdate {
id: string;
firstName: string;
lastName: string;
fullName?: string;
}
export async function updateUser(userId: string, updates: Partial<UserUpdate>): Promise<void> {
// Determine schema version based on feature flag or config
const isNewSchemaEnabled = process.env.FEATURE_NEW_SCHEMA === 'true';
const setClause: string[] = [];
const values: any[] = [];
let paramIndex = 1;
if (updates.firstName !== undefined) {
setClause.push(`first_name = $${paramIndex++}`);
values.push(updates.firstName);
// Dual-write: populate new column if enabled
if (isNewSchemaEnabled) {
const current = await db.query('SELECT last_name FROM users WHERE id = $1', [userId]);
const lastName = current.rows[0]?.last_name ?? '';
setClause.push(`full_name = $${paramIndex++}`);
values.push(`${updates.firstName} ${lastName}`);
}
}
if (updates.lastName !== undefined) {
setClause.push(`last_name = $${paramIndex++}`);
values.push(updates.lastName);
if (isNewSchemaEnabled) {
const current = await db.query('SELECT first_name FROM users WHERE id = $1', [userId]);
const firstName = current.rows[0]?.first_name ?? '';
// Update full_name index if present in setClause logic or separate update
// In high-concurrency, consider using a trigger or async job for derivation
setClause.push(`full_name = $${paramIndex++}`);
values.push(`${firstName} ${updates.lastName}`);
}
}
if (setClause.length === 0) return;
setClause.push(`updated_at = NOW()`);
values.push(userId);
const query = `
UPDATE users
SET ${setClause.join(', ')}
WHERE id = $${paramIndex}
`;
await db.query(query, values);
}
4. Phase 3: Data Backfill
Backfill must be non-blocking. Use keyset pagination with batch limits.
// src/scripts/backfillFullName.ts
import { db } from './db';
const BATCH_SIZE = 1000;
const LOCK_TIMEOUT = '100ms';
export async function runBackfill(): Promise<void> {
// Set low lock timeout to prevent blocking production traffic
await db.query(`SET lock_timeout = '${LOCK_TIMEOUT}'`);
let lastId = '0';
let rowsAffected = 0;
do {
const result = await db.query(`
UPDATE users
SET full_name = CONCAT(first_name, ' ', last_name)
WHERE id > $1
AND full_name IS NULL
AND first_name IS NOT NULL
ORDER BY id ASC
LIMIT $2
RETURNING id
`, [lastId, BATCH_SIZE]);
rowsAffected = result.rowCount || 0;
if (rowsAffected > 0) {
lastId = result.rows[result.rows.length - 1].id;
console.log(`Backfilled batch up to ${lastId}`);
// Yield to replication and reduce load
await new Promise(resolve => setTimeout(resolve, 100));
}
} while (rowsAffected === BATCH_SIZE);
console.log('Backfill complete.');
}
5. Phase 4: Contract Migration
Once all instances run the new code and backfill is done, remove the old schema.
// migrations/002_add_full_name_contract.ts
export async function up(pool: Pool): Promise<void> {
const client = await pool.connect();
try {
await client.query('BEGIN');
// Drop old columns/indexes
await client.query(`
ALTER TABLE users
DROP COLUMN IF EXISTS first_name,
DROP COLUMN IF EXISTS last_name;
`);
// Update index to unique or primary if applicable
await client.query(`
ALTER TABLE users
ALTER COLUMN full_name SET NOT NULL;
`);
await client.query('COMMIT');
} catch (e) {
await client.query('ROLLBACK');
throw e;
} finally {
client.release();
}
}
Architecture Decisions
- Idempotency: All migrations use
IF EXISTS / IF NOT EXISTS. This allows re-running failed migrations without error, crucial for CI/CD resilience.
- Lock Management: Use
CONCURRENTLY for indexes and lock_timeout for data migrations. This prevents migration scripts from becoming denial-of-service attacks.
- Feature Flags: Decouple code deployment from schema activation. This allows rolling out the code that supports the new schema before the migration runs, or vice versa, depending on the change type.
- Backfill Strategy: Keyset pagination (
WHERE id > last_id) is used over OFFSET for performance on large tables. Batch sleeps prevent replication lag spikes.
Pitfall Guide
1. Dropping Columns Before Code Removal
Mistake: Removing a column in the migration while old application instances still query it.
Impact: Immediate 500 Internal Server Error across services during rolling deployments.
Best Practice: Always use the Contract phase only after verifying 100% of traffic hits the new code. Maintain a "grace period" where old columns exist but are unused.
2. Ignoring Table Locks on Large Tables
Mistake: Running ALTER TABLE without considering lock duration on tables with millions of rows.
Impact: Write operations block for the duration of the metadata change. In PostgreSQL, ADD COLUMN with a default value requires a full table rewrite.
Best Practice: Add columns as NULL first. Use SET DEFAULT in a separate statement. For MySQL, verify ALGORITHM=INSTANT support. Use pg_repack or pt-online-schema-change for complex alterations.
3. Non-Idempotent Migrations
Mistake: Writing migrations that fail on second execution (e.g., CREATE TABLE without IF NOT EXISTS).
Impact: CI/CD pipelines fail on retry, leaving the database in an inconsistent state.
Best Practice: Enforce idempotency in migration linters. Every migration should be safe to run multiple times.
4. Migration-Induced Replication Lag
Mistake: Running massive data updates or index creations that saturate the master, causing replicas to fall behind.
Impact: Read replicas serve stale data, causing consistency bugs. Applications reading from replicas may see missing data.
Best Practice: Backfill in small batches with delays. Monitor pg_replication_lag or equivalent. Pause migrations if lag exceeds thresholds.
5. Assuming Transactional DDL Universally
Mistake: Wrapping all DDL in transactions assuming atomicity.
Impact: MySQL does not support transactional DDL for many operations. A failure leaves the schema partially altered.
Best Practice: Check database-specific transactional DDL support. Use explicit transaction blocks only where supported and safe. Implement manual rollback scripts for non-transactional changes.
6. Schema Drift in CI/CD
Mistake: Developers modifying the local database manually without generating migrations.
Impact: Production deployments fail because the migration history doesn't match the desired state.
Best Practice: Enforce migration generation via tooling. CI pipelines should validate that the schema defined in code matches the migration history. Use "diff" tools to detect drift.
7. Lack of Migration Monitoring
Mistake: Treating migrations as fire-and-forget scripts.
Impact: Silent failures, slow migrations impacting performance, or backfills running indefinitely.
Best Practice: Instrument migration runners. Emit metrics for migration duration, row counts, and lock waits. Alert on migrations exceeding time budgets.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small Table (<100k rows) | Big Bang Migration | Low lock duration; overhead of Expand/Contract is unnecessary. | Low |
| Large Table, Zero Downtime Required | Expand/Contract + Online DDL | Eliminates write blocks; supports concurrent deployments. | Medium (Dev effort) |
| Adding Index to Large Table | CREATE INDEX CONCURRENTLY | Prevents table locks; allows writes during index build. | Low |
| MySQL 8.0+ Instant DDL | ALGORITHM=INSTANT | Metadata-only change; instant execution. | Low |
PostgreSQL ADD COLUMN with Default | Expand/Contract | Avoids full table rewrite; SET DEFAULT is instant. | Medium |
| Multi-Region Database | Dual-Write + Conflict Resolution | Handles latency and consistency across regions. | High |
Configuration Template
Robust migration runner configuration for TypeScript projects using node-pg-migrate patterns.
// db/migrate.ts
import { Pool } from 'pg';
import { runMigration } from 'node-pg-migrate';
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
// Production safety settings
max: 20,
idleTimeoutMillis: 30000,
});
async function main() {
const client = await pool.connect();
try {
// Safety: Prevent long-running locks
await client.query(`SET lock_timeout = '100ms'`);
await client.query(`SET statement_timeout = '60000'`);
await runMigration({
databaseUrl: process.env.DATABASE_URL!,
dir: './migrations',
direction: 'up',
migrationsTable: 'migrations_meta',
verbose: true,
// Dry run capability for CI
dryRun: process.env.DRY_RUN === 'true',
});
console.log('Migration completed successfully.');
} catch (error) {
console.error('Migration failed:', error);
process.exit(1);
} finally {
client.release();
await pool.end();
}
}
main();
Quick Start Guide
-
Initialize Migration Tool:
npm install node-pg-migrate pg
npx node-pg-migrate create init_schema
-
Configure Runner:
Create db/migrate.ts using the Configuration Template above. Ensure DATABASE_URL is set in your environment.
-
Write Expand Migration:
npx node-pg-migrate create add_user_full_name_expand
Implement the up function with ADD COLUMN and CREATE INDEX CONCURRENTLY.
-
Run Migration:
ts-node db/migrate.ts
Monitor output for lock timeouts or errors.
-
Deploy Code & Backfill:
Deploy application code with feature flags. Run backfill script in a controlled job. Once verified, proceed to Contract migration.
Database schema evolution is not a SQL exercise; it is a deployment strategy. Treat schema changes with the same rigor as code releases, enforce backward compatibility, and automate safety checks. The Expand/Contract pattern is the industry standard for a reason: it transforms schema changes from a risk factor into a routine operation.