rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Filter for queries with `mean_exec_time > 50ms` and `calls > 1000/day`. Export these patterns to build an index requirement matrix.
### Step 2: Select Index Type by Access Pattern
Not all indexes are B-trees. Matching the index type to the query operator drastically reduces planner cost.
- **B-tree**: Default. Supports `=`, `<`, `>`, `BETWEEN`, `LIKE 'prefix%'`, `ORDER BY`. Use for 90% of relational queries.
- **Hash**: Equality-only (`=`). Faster for exact matches, but lacks range support and requires explicit rebuild after crashes in older PostgreSQL versions. Use for high-throughput key lookups.
- **GIN**: Generalized Inverted Index. Optimized for arrays, JSONB, full-text search. Supports `@>`, `?`, `@@`. Higher write cost, but unmatched for document/nested queries.
- **GiST**: Geometric and full-text. Supports complex operators, range queries, and nearest-neighbor searches. Use for spatial or custom data types.
- **BRIN**: Block Range INdexes. For highly correlated data (timestamps, sequential IDs). Minimal storage overhead, but requires monotonic data distribution.
### Step 3: Design Composite Indexes with Selectivity Ordering
Composite indexes follow the left-prefix rule. Column order dictates which query conditions can be satisfied without a filter step.
Rule: Place high-selectivity columns first, equality conditions before range conditions, and `ORDER BY` columns last if they match the index sort order.
```sql
-- Poor: Low selectivity first, range before equality
CREATE INDEX idx_users_bad ON users (status, created_at, department_id);
-- Optimized: High selectivity first, equality before range, sort aligned
CREATE INDEX idx_users_optimized ON users (department_id, status, created_at DESC);
Query compatibility:
WHERE department_id = 5 AND status = 'active' → Uses full index
WHERE department_id = 5 AND created_at > '2024-01-01' → Uses prefix, filters rest
WHERE status = 'active' → Index scan with filter (planner may prefer seq scan)
Step 4: Implement with Production Safety
Never create indexes synchronously on production tables. Use CONCURRENTLY to avoid table locks, and leverage INCLUDE for covering indexes without bloating the index key.
-- Safe index creation
CREATE INDEX CONCURRENTLY idx_orders_customer_date
ON orders (customer_id, order_date DESC)
INCLUDE (total_amount, status);
-- Partial index for active records only
CREATE INDEX CONCURRENTLY idx_subscriptions_active
ON subscriptions (user_id, expires_at)
WHERE status = 'active';
Step 5: Verify with EXPLAIN ANALYZE
Execution plans must be validated against actual data distribution.
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT total_amount, status
FROM orders
WHERE customer_id = 12345
AND order_date > '2024-06-01'
ORDER BY order_date DESC;
Look for:
Index Only Scan → Covering index working
Index Scan → Heap fetch required, acceptable if selectivity is high
Seq Scan → Planner rejected index due to low selectivity or missing statistics
Run ANALYZE orders; after bulk operations to refresh planner statistics. Stale statistics are the #1 cause of index abandonment.
Architecture Decisions and Rationale
- Read-heavy vs Write-heavy: Read-optimized architectures tolerate more indexes and covering strategies. Write-heavy systems prioritize partial indexes, BRIN for time-series, and aggressive index consolidation.
- Hot/Cold Data Partitioning: Move historical data to partitioned tables or separate schemas. Index only the hot partition. This reduces bloat and maintenance windows.
- Index-Only Scans: Use
INCLUDE to store frequently accessed columns in the index leaf nodes. This eliminates heap visits, reducing I/O and improving cache locality.
- ORM Integration: Map index strategy to query builder patterns. If using Prisma or Knex, ensure generated queries align with left-prefix rules. Avoid
SELECT * when covering indexes are in place.
Pitfall Guide
1. Indexing Every Column
Why it happens: Developers assume more indexes equal faster queries.
Impact: Write amplification scales linearly with index count. Each INSERT must update every index, increasing WAL volume, checkpoint pressure, and lock contention.
Fix: Index only columns used in WHERE, JOIN, ORDER BY, or GROUP BY. Use pg_stat_user_indexes to identify unused indexes and drop them.
2. Incorrect Composite Column Order
Why it happens: Columns are ordered alphabetically or by perceived importance rather than query patterns.
Impact: Queries that don't match the left prefix force index scans with heavy filter steps, or trigger sequential scans. Planner cost estimates become inaccurate.
Fix: Align column order with actual query predicates. Use pg_stat_statements to extract common WHERE combinations and validate with EXPLAIN.
3. Ignoring Index Bloat
Why it happens: Updates and deletes leave dead tuples in index leaf pages. Vacuum doesn't automatically reclaim index space.
Impact: Bloated indexes increase I/O, reduce cache efficiency, and force the planner to choose sequential scans due to inflated cost estimates.
Fix: Monitor pgstattuple or pg_bloat_check. Schedule REINDEX CONCURRENTLY during low-traffic windows. Tune autovacuum settings for high-churn tables.
4. Overlooking Write Amplification
Why it happens: Focus remains on read latency while write throughput degrades silently.
Impact: Batch inserts slow by 300%+, replication lag increases, and connection pools exhaust due to longer transaction durations.
Fix: Benchmark write performance with indexes in place. Use partial indexes to exclude inactive data. Consider deferred indexing for bulk loads.
5. Using the Wrong Index Type
Why it happens: Default B-tree is applied to JSONB, arrays, or full-text queries.
Impact: Queries fall back to sequential scans or expensive filter operations. GIN/GiST operators remain unused.
Fix: Map operators to index types. Use GIN for @>, ?, @@. Use BRIN for monotonically increasing timestamps. Use Hash only for pure equality workloads.
Why it happens: Developers see Index Scan in EXPLAIN and assume optimization is complete.
Impact: High cardinality queries still trigger massive heap fetches. Random I/O dominates latency.
Fix: Target Index Only Scan for hot queries. Use INCLUDE to cover selected columns. Verify buffer hit ratios with EXPLAIN (BUFFERS).
7. Neglecting Statistics Refresh
Why it happens: Auto-analyze is disabled or misconfigured. Bulk operations bypass statistics updates.
Impact: Planner uses stale row estimates, chooses suboptimal plans, and abandons indexes.
Fix: Run ANALYZE after bulk inserts/updates. Increase default_statistics_target for skewed columns. Monitor pg_stat_user_tables for n_mod_since_analyze.
Production Best Practices
- Maintain index selectivity > 0.1. Columns with <10% unique values rarely benefit from standalone indexes.
- Limit indexes per table to 5-7. Beyond this, write overhead outweighs read gains.
- Use
pg_stat_user_indexes to track idx_scan, idx_tup_read, idx_tup_fetch. Drop indexes with zero scans over 30 days.
- Align index maintenance with deployment cycles. Create indexes in migrations, verify in staging, monitor in production.
- Document index rationale in schema comments. Future engineers need context to avoid accidental drops or duplicates.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-frequency equality lookups on UUIDs | Hash index | O(1) lookup, minimal planner overhead | Low storage, moderate write cost |
| Time-series data with monotonic timestamps | BRIN index | Block-level summaries, 90% storage reduction | Near-zero write overhead, requires data correlation |
| JSONB document queries with nested filters | GIN index | Inverted token mapping, supports @> and ? | High write cost, optimal read performance |
| Write-heavy transaction table with active/inactive states | Partial index on status = 'active' | Excludes cold data from index structure | Reduces storage by 40-60%, lowers write amplification |
| Hot reporting query selecting 3-4 columns | Covering index with INCLUDE | Enables index-only scans, eliminates heap fetch | Increases index size by 15-25%, cuts read latency by 60% |
Configuration Template
-- Index strategy template for PostgreSQL
-- Replace table/column names with your schema
-- 1. Composite index with left-prefix alignment
CREATE INDEX CONCURRENTLY idx_composite_template
ON your_table (high_selectivity_col, equality_col, range_col DESC)
INCLUDE (frequently_selected_col1, frequently_selected_col2);
-- 2. Partial index for active data slice
CREATE INDEX CONCURRENTLY idx_partial_template
ON your_table (user_id, created_at)
WHERE status = 'active' AND deleted_at IS NULL;
-- 3. GIN index for JSONB/document queries
CREATE INDEX CONCURRENTLY idx_gin_template
ON your_table USING GIN (metadata jsonb_path_ops);
-- 4. BRIN index for time-series
CREATE INDEX CONCURRENTLY idx_brin_template
ON your_table USING BRIN (created_at) WITH (pages_per_range = 128);
-- Monitoring query for index usage and bloat
SELECT
schemaname, tablename, indexname,
idx_scan, idx_tup_read, idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexname::regclass) DESC;
TypeScript/Prisma alignment example:
// Ensure Prisma queries match left-prefix index order
// Index: (department_id, status, created_at DESC)
// Query must filter department_id first, then status, then range on created_at
const activeUsers = await prisma.user.findMany({
where: {
departmentId: 5,
status: 'active',
createdAt: { gt: new Date('2024-06-01') }
},
orderBy: { createdAt: 'desc' },
select: { id: true, name: true, email: true } // Avoid SELECT * for covering index benefit
});
Quick Start Guide
- Extract query patterns: Run
SELECT query, calls, mean_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20; and export results to a spreadsheet.
- Map index requirements: For each query, list
WHERE columns, operators, ORDER BY, and selected columns. Group by common predicates.
- Create strategic indexes: Use
CREATE INDEX CONCURRENTLY with composite ordering, partial filters, or INCLUDE clauses based on the mapping.
- Validate execution plans: Run
EXPLAIN (ANALYZE, BUFFERS) on target queries. Confirm Index Only Scan or efficient Index Scan. Run ANALYZE table_name; if planner still chooses sequential scans.
- Monitor for 14 days: Track
idx_scan, storage growth, and write latency. Drop unused indexes. Schedule REINDEX CONCURRENTLY if bloat exceeds 30%.
Indexing is not a configuration toggle. It is a continuous alignment between query behavior, data distribution, and storage architecture. Treat indexes as part of your system's capacity model, not an afterthought. The performance ceiling of your database is defined by how deliberately you structure access paths.