c, and monitoring with application consistency requirements. The following architecture uses a primary-write, read-replica topology with lag-aware routing, implemented on PostgreSQL with logical replication.
Step 1: Define Consistency Boundaries
Map application endpoints to consistency requirements. Classify operations into:
- Strong consistency: Financial transactions, inventory deductions, user authentication
- Bounded consistency: Dashboard metrics, session validation, recommendation feeds
- Eventual consistency: Analytics aggregations, audit logs, cache warmups
Set up a primary node and two read replicas using PostgreSQL logical replication. Logical replication provides row-level filtering, lower overhead than physical streaming, and supports heterogeneous versions.
Step 3: Implement Lag-Aware Read Routing
Route reads based on real-time replication lag. The following TypeScript service queries replication statistics and enforces consistency boundaries.
import { Pool } from 'pg';
interface ReplicationStatus {
lag_ms: number;
state: 'streaming' | 'catchup' | 'down';
}
class LagAwareRouter {
private replicaPool: Pool;
private consistencyThresholds = {
strong: 0,
bounded: 200,
eventual: Infinity
};
constructor(replicaConnectionString: string) {
this.replicaPool = new Pool({ connectionString: replicaConnectionString });
}
async getReplicationStatus(): Promise<ReplicationStatus> {
const res = await this.replicaPool.query(`
SELECT
COALESCE(EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000, -1) as lag_ms,
CASE
WHEN pg_last_xact_replay_timestamp() IS NULL THEN 'down'
WHEN EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) > 5 THEN 'catchup'
ELSE 'streaming'
END as state
`);
return res.rows[0];
}
async routeRead(requiredConsistency: 'strong' | 'bounded' | 'eventual'): Promise<Pool> {
const status = await this.getReplicationStatus();
const threshold = this.consistencyThresholds[requiredConsistency];
if (status.lag_ms > threshold) {
// Fallback to primary for strong/bounded when replica lags
return this.getPrimaryPool();
}
return this.replicaPool;
}
private getPrimaryPool(): Pool {
// Return primary connection pool in production
throw new Error('Primary pool not implemented in this snippet');
}
}
// Usage
const router = new LagAwareRouter(process.env.REPLICA_CONN_STRING);
async function fetchUserDashboard(userId: string) {
const pool = await router.routeRead('bounded');
const res = await pool.query('SELECT * FROM user_metrics WHERE user_id = $1', [userId]);
return res.rows[0];
}
Logical replication requires replication slots to prevent WAL recycling. Configure with retention policies to avoid disk exhaustion:
-- Create slot with restart_lsn tracking
SELECT pg_create_logical_replication_slot('app_read_slot', 'pgoutput');
-- Monitor slot activity
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;
Step 5: Implement Automated Failover
Use Patroni or PgAutoFailover for synchronous replication management. Configure quorum-based promotion to prevent split-brain during network partitions.
Architecture Rationale
This architecture decouples write scalability from read scalability while enforcing consistency boundaries at the application layer. Lag-aware routing prevents stale reads without sacrificing primary write throughput. Logical replication enables selective table replication, reducing network overhead. Slot monitoring prevents WAL accumulation, and quorum-based failover ensures deterministic promotion. The design prioritizes observability and explicit consistency contracts over implicit infrastructure guarantees.
Pitfall Guide
-
Assuming asynchronous replication has zero write latency impact
Async replication offloads WAL shipping to a background process, but network serialization, compression, and disk I/O still consume CPU and bandwidth. Under high write throughput, the primary's WAL writer becomes a bottleneck, increasing transaction commit latency by 10–20% even before replicas fall behind. Mitigate by sizing network bandwidth to 2x peak WAL generation rate and monitoring pg_stat_wal.
-
Ignoring replication slot retention
Logical replication slots retain WAL segments until the consumer acknowledges receipt. If a replica disconnects or falls behind, the primary continues accumulating WAL, eventually exhausting disk space. Production clusters have experienced complete outages from unmonitored slots. Implement slot age monitoring and automatic deactivation when confirmed_flush_lsn stagnates beyond a configurable threshold.
-
Routing critical reads to lagging replicas
Applications that blindly round-robin across replicas without checking lag will serve stale data during write bursts. Financial balances, inventory counts, and session tokens become invalid. Always pair read routing with real-time lag verification and fallback to primary when thresholds are breached.
-
Treating replication lag as a static threshold
Lag is not a fixed value; it scales with write volume, network jitter, and replica resource contention. A 100ms threshold that works during off-peak hours will fail during flash sales. Implement dynamic thresholds based on moving averages of write throughput and network latency, or use consistency-bound routing that adapts to current cluster state.
-
Underestimating conflict resolution overhead in multi-master
Multi-master replication requires conflict detection and resolution logic. Last-write-wins strategies discard concurrent updates. Vector clock approaches preserve history but increase storage by 15–25%. Custom conflict handlers add application complexity and testing surface area. Only deploy multi-master when write locality requirements justify the operational cost.
-
Not testing split-brain scenarios
Network partitions are inevitable. Clusters without quorum configuration will promote multiple primaries, causing data divergence. Test partition scenarios using network simulation tools (e.g., tc or Chaos Mesh) and verify that only one node accepts writes. Document expected behavior and automate recovery procedures.
-
Failing to monitor replication topology holistically
Tracking lag in isolation misses systemic issues. Replica CPU saturation, disk I/O contention, and connection pool exhaustion all manifest as increased lag. Implement composite health checks that correlate lag with resource utilization, network throughput, and transaction commit rates.
Best Practices from Production:
- Enforce consistency contracts at the API layer, not the database layer
- Use replication slots with automated lifecycle management
- Route reads based on real-time lag, not static configuration
- Test failover procedures quarterly with game-day simulations
- Document expected staleness per endpoint in API specifications
- Monitor WAL generation rate against network capacity
- Implement circuit breakers for replica fallback during sustained lag
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Financial transactions & inventory | Synchronous or semi-sync with primary-only reads | Guarantees data integrity, prevents double-spending | +25% infrastructure, +15% engineering time |
| Real-time dashboards & session validation | Semi-sync with bounded consistency routing | Balances freshness and latency, tolerates minor staleness | +10% infrastructure, minimal engineering |
| Analytics & audit logging | Async replication with eventual consistency | Maximizes write throughput, accepts 100-500ms lag | Baseline infrastructure, low engineering |
| Geo-distributed SaaS with local writes | Multi-master with conflict resolution | Reduces write latency across regions, maintains availability | +40% infrastructure, +30% engineering |
| High-frequency trading / fraud detection | Synchronous with dedicated replica | Zero tolerance for stale data, requires deterministic failover | +50% infrastructure, +40% engineering |
Configuration Template
postgresql.conf (Primary)
wal_level = logical
max_replication_slots = 4
max_wal_senders = 10
wal_keep_size = 1GB
shared_preload_libraries = 'pg_stat_statements'
postgresql.conf (Replica)
hot_standby = on
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on
pg_hba.conf (Both)
# Replication connections
host replication replicator 10.0.0.0/8 scram-sha-256
# Application reads
host all app_user 10.0.0.0/8 scram-sha-256
Replication Slot Setup
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';
SELECT pg_create_logical_replication_slot('app_logical_slot', 'pgoutput');
-- Monitoring query
SELECT
slot_name,
active,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_bytes,
age(now(), pg_last_xact_replay_timestamp()) AS replay_age
FROM pg_replication_slots;
Quick Start Guide
- Provision topology: Deploy one primary and two read replicas using your preferred orchestration tool. Ensure network latency between nodes is <5ms for semi-sync viability.
- Configure WAL and slots: Apply
postgresql.conf settings to primary, create logical replication slot, and grant replication privileges to dedicated user.
- Initialize logical replication: Create publication on primary (
CREATE PUBLICATION app_pub FOR TABLE users, orders;), subscribe on replicas (CREATE SUBSCRIPTION app_sub CONNECTION 'host=primary...' PUBLICATION app_pub;).
- Deploy routing service: Integrate the TypeScript lag-aware router into your application. Set consistency thresholds based on endpoint requirements. Validate routing behavior under simulated load.
- Enable monitoring: Deploy composite health checks tracking lag, WAL retention, and resource utilization. Configure alerts for lag >200ms, slot age >1h, and WAL retention >80% disk capacity.
Replication is not an infrastructure toggle; it is an architectural contract. Define consistency boundaries, monitor lag dynamically, and route reads intentionally. Systems that treat replication as a first-class design constraint outperform those that treat it as an afterthought.