, capture execution patterns. Enable pg_stat_statements to track query frequency, planning time, and execution time.
-- postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 5000
-- Identify top resource consumers
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
Step 2: Memory Allocation Strategy
Memory parameters dictate how PostgreSQL utilizes RAM for caching, sorting, and maintenance. Misconfiguration causes disk spill or memory pressure.
# 25% of total RAM for shared buffers (OS cache handles the rest)
shared_buffers = 8GB
# Per-operation memory for sorts/hashes. Multiply by max_connections for worst-case.
# Keep conservative: 64MBβ256MB depending on connection count
work_mem = 128MB
# Memory for VACUUM, CREATE INDEX, ALTER TABLE
maintenance_work_mem = 2GB
# Planner hint for available OS cache + shared_buffers
effective_cache_size = 24GB
Architecture Decision: Never set shared_buffers above 30% of RAM. PostgreSQL relies on the OS page cache for I/O efficiency. Over-allocating shared buffers forces the OS to evict frequently accessed data, increasing page faults. work_mem is allocated per sort/hash operation per connection. At 500 connections, 256MB work_mem could theoretically consume 128GB. Cap it based on actual sort requirements, verified via EXPLAIN ANALYZE output showing Sort Method: external merge Disk.
Step 3: I/O and Checkpoint Tuning
Checkpoints flush dirty pages to disk. Aggressive checkpoints cause I/O storms; infrequent checkpoints increase crash recovery time and WAL volume.
# Target checkpoint duration to spread I/O
checkpoint_completion_target = 0.9
# WAL retention before checkpoint
max_wal_size = 4GB
min_wal_size = 1GB
# Adjust planner cost for SSDs (default assumes HDD)
random_page_cost = 1.1
effective_io_concurrency = 200
Architecture Decision: Lower random_page_cost to 1.1 on NVMe/SSD storage to encourage index scans over sequential scans. Set checkpoint_completion_target to 0.9 to distribute checkpoint writes across 90% of the checkpoint interval, preventing I/O saturation spikes.
Step 4: Indexing and Query Optimization
Indexes accelerate reads but degrade writes. Targeted indexing beats blanket coverage.
-- Partial index for active records
CREATE INDEX idx_orders_active ON orders (created_at)
WHERE status = 'pending';
-- Covering index to enable index-only scans
CREATE INDEX idx_users_email_name ON users (email) INCLUDE (name, created_at);
-- Verify index usage
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT name, created_at FROM users WHERE email = 'test@example.com';
Architecture Decision: Prioritize partial indexes for filtered datasets (e.g., status != 'archived'). Use INCLUDE columns to satisfy queries entirely from the index leaf, eliminating heap fetches. Avoid indexing columns with low cardinality unless combined with high-selectivity predicates.
Step 5: Connection Management
PostgreSQL forks a process per connection. High max_connections consumes RAM and CPU, degrading performance.
max_connections = 100
Deploy PgBouncer as a connection pooler:
[databases]
appdb = host=127.0.0.1 dbname=appdb
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode = transaction
max_client_conn = 500
default_pool_size = 25
reserve_pool = 5
Architecture Decision: Route all application connections through PgBouncer in transaction mode. This decouples application concurrency from PostgreSQL process limits, reducing context-switching overhead and preventing connection storms during traffic spikes.
Step 6: Maintenance and Vacuum Tuning
Dead tuples accumulate from UPDATE/DELETE operations. Unmanaged bloat increases I/O, slows scans, and inflates table size.
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 30s
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.02
Monitor bloat:
SELECT schemaname, relname, n_dead_tup, n_live_tup,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0), 4) AS dead_ratio
FROM pg_stat_user_tables
ORDER BY dead_ratio DESC;
Architecture Decision: Tune autovacuum_vacuum_scale_factor based on table size. High-churn tables (e.g., event logs) benefit from 0.02β0.05. Large historical tables with infrequent updates can tolerate 0.1. Disable autovacuum only for read-only archives; never for transactional tables.
Pitfall Guide
-
Setting shared_buffers to 50%+ of RAM: Forces OS cache eviction, increases page faults, and degrades I/O throughput. PostgreSQL's architecture relies on the OS page cache for efficient read-ahead and write-back. Stick to 25%.
-
Ignoring work_mem multiplication: work_mem is allocated per sort/hash operation per connection. At 300 connections with 256MB work_mem, theoretical peak usage reaches 76GB. Verify actual sort requirements via EXPLAIN ANALYZE and cap conservatively.
-
Over-indexing write-heavy tables: Each index requires maintenance on INSERT/UPDATE/DELETE. Write throughput degrades linearly with index count. Index only columns used in WHERE, JOIN, or ORDER BY clauses. Use pg_stat_user_indexes to identify unused indexes.
-
Disabling or misconfiguring autovacuum: Dead tuple accumulation causes table bloat, increasing sequential scan time and WAL generation. Adjust thresholds per table churn rate. Never disable on transactional workloads.
-
Chasing query micro-optimizations before fixing missing indexes: Rewriting SQL without addressing missing indexes yields marginal gains. Indexing typically delivers 10β100x latency reduction. Validate index coverage first.
-
Using max_connections as a scaling lever: Process-per-connection model consumes RAM and CPU. High connection counts increase lock contention and context switching. Use connection pooling (PgBouncer) instead.
-
Blindly applying pg_tune or AI-generated configs: Automated tools lack workload context. A time-series database requires different checkpoint and vacuum tuning than a session store. Validate all parameters against execution plans and telemetry.
Best Practices from Production:
- Tune incrementally. Change one parameter set, measure impact, then proceed.
- Use
EXPLAIN (ANALYZE, BUFFERS) to validate index usage and memory allocation.
- Monitor
pg_stat_bgwriter for checkpoint frequency and buffer allocation.
- Schedule heavy maintenance (REINDEX, VACUUM FULL) during low-traffic windows.
- Partition tables exceeding 10GB with time-based or range-based keys to improve scan efficiency and maintenance speed.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Read-heavy analytics (80% SELECT) | Tune effective_cache_size, add covering indexes, enable materialized views | Reduces heap fetches and sequential scans | Low (infra unchanged) |
| Write-heavy transactional (60% INSERT/UPDATE) | Optimize autovacuum, reduce indexes, increase max_wal_size, use connection pooling | Minimizes WAL flush contention and index maintenance overhead | Low-Medium (pooler infra) |
| Mixed OLTP/OLAP with latency spikes | Partition large tables, tune work_mem, deploy read replica for reporting | Isolates heavy scans from transactional path | Medium (replica cost) |
| Time-series data > 50GB | BRIN indexes, table partitioning by time, aggressive autovacuum tuning | BRIN leverages physical ordering; partitions limit scan scope | Low (storage optimized) |
| Connection storms during traffic peaks | PgBouncer transaction mode, lower max_connections, circuit breakers | Prevents process exhaustion and CPU thrashing | Low (pooler cost) |
Configuration Template
# Memory
shared_buffers = 8GB
work_mem = 128MB
maintenance_work_mem = 2GB
effective_cache_size = 24GB
# Checkpoints & WAL
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB
wal_buffers = 64MB
# I/O & Planner
random_page_cost = 1.1
effective_io_concurrency = 200
default_statistics_target = 100
# Connections & Pooling
max_connections = 100
superuser_reserved_connections = 3
# Autovacuum
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 30s
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.02
# Observability
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 5000
log_min_duration_statement = 200
log_checkpoints = on
log_connections = on
log_disconnections = on
Quick Start Guide
- Install & Enable Telemetry: Add
pg_stat_statements to shared_preload_libraries, restart PostgreSQL, and run CREATE EXTENSION pg_stat_statements;
- Apply Memory & I/O Config: Update
postgresql.conf with the template values matching your RAM and storage type. Restart the service.
- Deploy Connection Pooler: Install PgBouncer, configure
pool_mode = transaction, point it to PostgreSQL, and redirect application DSNs to 127.0.0.1:6432.
- Validate & Monitor: Run
EXPLAIN (ANALYZE, BUFFERS) on top 5 queries. Check pg_stat_statements for execution time reduction. Monitor pg_stat_bgwriter for checkpoint distribution. Iterate every 48 hours.