tion
Building a deterministic database performance testing pipeline requires environment parity, realistic workload simulation, percentile-based metrics, and plan regression detection. The following implementation uses TypeScript, PostgreSQL, and a connection-aware load harness.
Step 1: Environment Parity & Data Volume
Production databases are not empty. They contain skewed data distributions, fragmented indexes, and accumulated bloat. Replicate this in testing using logical dumps with sampled real data or synthetic data generators that match cardinality and value distributions.
// seed.ts - Populate test database with realistic distribution
import { Pool } from 'pg';
const pool = new Pool({
host: process.env.DB_HOST,
port: 5432,
database: 'test_perf',
user: 'perf_user',
password: process.env.DB_PASS,
max: 50, // Match production pool size
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
async function seedRealisticData() {
const client = await pool.connect();
try {
await client.query('BEGIN');
// Insert 1.2M rows with skewed distribution matching prod
await client.query(`
INSERT INTO events (user_id, created_at, payload, status)
SELECT
(random() * 100000)::int,
NOW() - (random() * interval '365 days'),
jsonb_build_object('type', (ARRAY['click','view','purchase'])[floor(random()*3+1)]),
(ARRAY['pending','success','failed'])[floor(random()*3+1)]
FROM generate_series(1, 1200000);
`);
await client.query('COMMIT');
// Analyze to update planner statistics
await client.query('ANALYZE events');
} finally {
client.release();
}
}
seedRealisticData().catch(console.error);
Step 2: Connection-Aware Load Generation
Database stress is not request count. It is concurrent connections, query mix, and pool saturation. The load harness must enforce connection limits, simulate think time, and track pool exhaustion.
// load-test.ts - Concurrent DB load with percentile tracking
import { Pool } from 'pg';
import { performance } from 'perf_hooks';
const pool = new Pool({
host: process.env.DB_HOST,
database: 'test_perf',
user: 'perf_user',
password: process.env.DB_PASS,
max: 30, // Intentionally below production to test saturation
idleTimeoutMillis: 10000,
});
const latencies: number[] = [];
let errors = 0;
let timeouts = 0;
async function executeQuery(query: string, params: any[] = []) {
const start = performance.now();
try {
const client = await pool.connect();
await client.query(query, params);
const duration = performance.now() - start;
latencies.push(duration);
client.release();
} catch (err: any) {
if (err.message.includes('timeout')) timeouts++;
else errors++;
}
}
async function runLoad(durationMs: number, concurrency: number) {
const startTime = performance.now();
const tasks: Promise<void>[] = [];
while (performance.now() - startTime < durationMs) {
for (let i = 0; i < concurrency; i++) {
tasks.push(executeQuery(
'SELECT count(*) FROM events WHERE user_id = $1 AND created_at > $2',
[Math.floor(Math.random() * 100000), new Date(Date.now() - 86400000)]
));
}
await Promise.all(tasks);
tasks.length = 0;
// Simulate realistic think time
await new Promise(res => setTimeout(res, Math.random() * 50));
}
const sorted = latencies.sort((a, b) => a - b);
const p50 = sorted[Math.floor(sorted.length * 0.5)];
const p95 = sorted[Math.floor(sorted.length * 0.95)];
const p99 = sorted[Math.floor(sorted.length * 0.99)];
console.log({
p50: `${p50.toFixed(2)}ms`,
p95: `${p95.toFixed(2)}ms`,
p99: `${p99.toFixed(2)}ms`,
errors,
timeouts,
activeConnections: pool.totalCount - pool.idleCount,
});
}
runLoad(30000, 25).catch(console.error);
Step 3: Query Plan Regression Detection
Latency alone masks inefficient execution. Capture EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) before and after changes. Diff the JSON output to detect sequential scans, missing index usage, or increased buffer hits.
// plan-regression.ts
import { Pool } from 'pg';
const pool = new Pool({ /* config */ });
async function getQueryPlan(query: string, params: any[] = []) {
const client = await pool.connect();
try {
const res = await client.query(
`EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) ${query}`,
params
);
return JSON.parse(res.rows[0]['QUERY PLAN'])[0];
} finally {
client.release();
}
}
// Compare baseline vs current plan
function detectRegression(baseline: any, current: any): string[] {
const issues: string[] = [];
if (baseline.Plan["Total Cost"] < current.Plan["Total Cost"] * 1.1) {
issues.push('Cost increased >10%');
}
if (baseline.Plan["Node Type"] === 'Index Scan' && current.Plan["Node Type"] === 'Seq Scan') {
issues.push('Index dropped or planner switched to sequential scan');
}
return issues;
}
Architecture Decisions & Rationale
- Separate read/write workloads: Mixed workloads mask contention. Test read-heavy queries under connection saturation, and write-heavy transactions under lock contention.
- Connection pool boundaries in tests: Production fails when pool limits are reached, not when CPU hits 100%. Tests must enforce
max connections to surface exhaustion.
- Percentile tracking over averages: p95/p99 latency reflects user experience. Averages smooth out tail latency that causes timeout cascades.
- Plan fingerprinting: Query plans change when statistics update, indexes are rebuilt, or PostgreSQL versions upgrade. Tracking plan structure prevents silent degradation.
- Cache warming: Cold caches inflate latency. Pre-warm shared buffers and OS page cache before measurement to reflect steady-state performance.
Pitfall Guide
- Testing with empty or toy datasets: The query planner optimizes differently for 10k vs 10M rows. Index selectivity, join strategies, and sort algorithms change at scale. Always seed with production-cardinality data matching value distribution.
- Ignoring connection pool limits: Setting
max: 100 in tests when production uses max: 20 masks pool exhaustion. Tests must replicate production pool configuration exactly.
- Relying on average latency: Averages hide tail latency. p95/p99 metrics reveal timeout thresholds, lock wait times, and connection queuing that cause cascading failures.
- Skipping cache warming: Cold database and OS caches produce artificially high latency. Run a warm-up phase that touches all relevant indexes and tables before measurement.
- Not testing schema migrations under load: DDL operations block or degrade concurrent queries. Test
ALTER TABLE, index creation, and partitioning during active load to detect lock contention.
- Assuming dev DB config matches prod: Different
work_mem, shared_buffers, max_parallel_workers, or autovacuum settings change execution plans. Export production SHOW ALL and apply to test environments.
- Measuring only success rates: 99% success with 1% deadlocks or connection timeouts is a production incident. Track error categories: timeouts, lock deadlocks, connection refused, and plan regressions.
Best Practices from Production:
- Run performance tests on every schema change, dependency upgrade, and major deployment.
- Store baseline metrics in versioned artifacts. Compare against them, not against arbitrary thresholds.
- Monitor
pg_stat_statements or equivalent to track query frequency and cumulative cost.
- Enforce connection limits in test harnesses to surface exhaustion before production.
- Combine load testing with plan regression detection. Latency without plan context is incomplete.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Read-heavy API with caching | Baseline regression testing + connection saturation | Caching masks DB load; plan drift causes cache misses | Low: CI/CD gate, minimal infra |
| Write-heavy transactional workload | Production traffic replay + lock contention testing | Write patterns cause row locks, autovacuum pressure, and WAL contention | Medium: Requires staging parity, storage for replay data |
| Schema migration or index change | Plan regression detection + DDL under load | Planner switches to sequential scans, locks block concurrent queries | Low: Automated diff, fast feedback loop |
| Database version upgrade | Full traffic replay + extended warm-up | New planner, statistics format, and autovacuum behavior change execution | High: Requires full staging clone, longer test windows |
| Mixed read/write SaaS platform | Hybrid: traffic replay + baseline regression + connection limits | Realistic concurrency, cache hit ratios, and pool exhaustion detection | Medium-High: Requires orchestrated load generator and metrics pipeline |
Configuration Template
# docker-compose.test.yml
version: '3.8'
services:
db:
image: postgres:15
environment:
POSTGRES_DB: test_perf
POSTGRES_USER: perf_user
POSTGRES_PASSWORD: perf_pass
ports:
- "5432:5432"
command: >
postgres
-c shared_buffers=1GB
-c work_mem=64MB
-c max_connections=30
-c autovacuum=on
-c log_min_duration_statement=50
volumes:
- ./data:/var/lib/postgresql/data
load-generator:
build: .
environment:
DB_HOST: db
DB_PASS: perf_pass
depends_on:
- db
command: npm run test:load
// package.json scripts
{
"scripts": {
"test:seed": "ts-node seed.ts",
"test:load": "ts-node load-test.ts",
"test:plan": "ts-node plan-regression.ts",
"test:ci": "npm run test:seed && npm run test:load && npm run test:plan"
}
}
Quick Start Guide
- Spin up parity environment: Run
docker compose -f docker-compose.test.yml up -d to launch a PostgreSQL instance with production-matched configuration and connection limits.
- Seed realistic data: Execute
npm run test:seed to populate the database with production-cardinality rows and run ANALYZE to update planner statistics.
- Run load and capture metrics: Execute
npm run test:load to simulate concurrent connections, track p95/p99 latency, and report connection utilization and error categories.
- Validate query plans: Execute
npm run test:plan to capture EXPLAIN output, diff against baseline, and flag sequential scans or cost increases.
- Gate deployments: Integrate
npm run test:ci into CI/CD pipelines. Block merges if p95 latency exceeds baseline by >15%, connection utilization hits >90%, or plan regression detects index drops.
Database performance testing is not optional infrastructure. It is the control plane that separates predictable scaling from cascading failures. Implement the hybrid pipeline, enforce connection boundaries, track percentiles, and diff execution plans. The cost of testing is measured in minutes. The cost of skipping it is measured in incidents.