r guess how a query runs; inspect it. Use EXPLAIN ANALYZE to see the actual runtime and row counts versus estimates.
Key metrics to inspect:
- Seq Scan vs. Index Scan: A sequential scan on a large table is a red flag unless the query retrieves >20% of rows.
- Filter vs. Index Condition: Filters applied after row retrieval indicate the index is not being used for the predicate.
- Rows vs. Rows Removed by Filter: A high ratio indicates poor selectivity or missing indexes.
- Sort Operations: Explicit sorts are expensive. Check if the planner can use an index for ordering.
-- PostgreSQL Example
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT user_id, status, created_at
FROM orders
WHERE customer_id = 12345 AND status = 'PENDING';
Step 2: Index Architecture and Selectivity
Indexes are not free; they consume storage and degrade write performance. Design indexes based on query patterns and data selectivity.
Composite Index Order:
For composite indexes, column order is critical. Place high-selectivity equality columns first, followed by range columns.
- Rule:
WHERE col_a = ? AND col_b = ? AND col_c > ? requires index (col_a, col_b, col_b).
- The database can use the index for
col_a, col_b, and col_c. If col_c is first, the index cannot efficiently filter col_a.
Partial Indexes:
For tables with skewed data distributions, use partial indexes to reduce index size and maintenance overhead.
-- Only index pending orders, ignoring completed ones
CREATE INDEX idx_orders_pending
ON orders (customer_id, created_at)
WHERE status = 'PENDING';
Step 3: Covering Indexes for Read-Heavy Paths
Implement covering indexes for critical read paths to eliminate heap lookups. This is essential for high-throughput APIs.
-- Query: SELECT id, name, email FROM users WHERE tenant_id = ? AND active = true;
-- Covering Index includes all selected columns
CREATE INDEX idx_users_covering
ON users (tenant_id, active)
INCLUDE (name, email);
Note: The INCLUDE clause adds columns to the leaf level without adding them to the index key, optimizing storage and avoiding key length limits.
Step 4: SARGability and Query Rewriting
SARGable queries allow the database engine to leverage indexes. Violating SARGability forces full scans regardless of indexes.
Common SARGability Violations:
- Functions on Columns:
WHERE YEAR(created_at) = 2023 prevents index usage.
- Fix:
WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01'.
- Implicit Type Conversion: Comparing a string column to an integer forces a cast, bypassing the index.
- Fix: Ensure parameter types match column types exactly.
- Leading Wildcards:
WHERE name LIKE '%smith' cannot use a B-Tree index.
- Fix: Use Full-Text Search or Trigram indexes for substring search.
TypeScript Implementation Pattern:
When using parameterized queries, ensure type safety to prevent implicit casts.
// Bad: User ID passed as string, column is integer.
// Forces cast on column, kills index usage.
const users = await db.query(
'SELECT * FROM users WHERE id = $1',
['12345']
);
// Good: Explicit typing ensures index usage.
const users = await db.query(
'SELECT * FROM users WHERE id = $1',
[12345]
);
Step 5: Join Optimization
Understand join algorithms. PostgreSQL and MySQL use Nested Loop, Hash Join, and Merge Join.
- Nested Loop: Efficient for small result sets or indexed joins.
- Hash Join: Efficient for large unindexed joins but memory-intensive.
- Optimization: Ensure join columns are indexed. If a Hash Join is spilling to disk, increase
work_mem or optimize the join condition.
-- Ensure join keys are indexed
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Pitfall Guide
1. Over-Indexing
Mistake: Creating indexes for every column or query pattern.
Impact: Write performance degrades linearly with index count. Every INSERT, UPDATE, or DELETE must update all indexes. This increases lock contention and WAL generation.
Best Practice: Index only columns used in WHERE, JOIN, ORDER BY, or GROUP BY. Regularly audit unused indexes using pg_stat_user_indexes or sys.dm_db_index_usage_stats.
2. Ignoring Data Distribution
Mistake: Indexing low-selectivity columns like status or gender.
Impact: The planner may ignore the index because scanning the table is cheaper than random I/O lookups. This creates maintenance overhead with no performance gain.
Best Practice: Use partial indexes for low-selectivity columns if filtering for rare values (e.g., WHERE status = 'ERROR').
3. N+1 Query Patterns
Mistake: Fetching parent records and looping to fetch children individually.
Impact: Exponential latency growth. 100 parents result in 101 queries.
Best Practice: Use JOIN or batch loading. In ORMs, use eager loading (include, with) or DataLoader patterns to batch requests.
4. Implicit Type Casting
Mistake: Querying a VARCHAR column with an integer parameter, or vice versa.
Impact: The database applies a function to the column to match types, rendering indexes unusable.
Best Practice: Strictly enforce type matching in application code and database schema. Use TypeScript interfaces that map 1:1 to DB types.
5. SELECT * in Production
Mistake: Selecting all columns regardless of need.
Impact: Increases network payload, memory usage, and breaks covering index optimizations. The database must fetch data from the heap even if an index exists.
Best Practice: Always specify required columns. This enables Index-Only Scans and reduces I/O.
6. Stale Statistics
Mistake: Relying on default autovacuum/analyze settings for volatile tables.
Impact: The query planner makes poor decisions based on outdated row counts or value distributions, choosing sequential scans over indexes.
Best Practice: Monitor statistics freshness. Manually run ANALYZE after bulk loads or schema changes. Tune autovacuum thresholds for high-churn tables.
7. Missing Composite Index Prefix
Mistake: Creating index (B, A) but querying WHERE A = ?.
Impact: The index cannot be used for filtering on A alone due to the B-Tree structure.
Best Practice: Follow the leftmost prefix rule. If queries filter on A alone and A+B, create two indexes or ensure the composite index starts with A.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High Read / Low Write | Denormalization + Covering Indexes | Reduces joins and heap lookups; maximizes read throughput. | Higher storage cost; negligible compute savings. |
| High Write / Low Read | Minimal Indexes + Batch Inserts | Reduces write amplification and lock contention. | Lower storage; faster writes; read latency may increase. |
| Ad-Hoc Reporting | Columnar Store / OLAP Replica | Optimized for aggregations; isolates reporting load from OLTP. | Separate infrastructure cost; protects primary DB. |
| Real-Time Latency < 10ms | Covering Index + Application Cache | Eliminates disk I/O; serves data from memory. | Memory cost; cache invalidation complexity. |
| Large Table Scans | Table Partitioning | Limits scan scope to relevant partitions. | Management overhead; improved query performance. |
Configuration Template
PostgreSQL Performance Tuning (postgresql.conf)
# Memory Configuration
shared_buffers = 25% of RAM # e.g., 8GB for 32GB instance
work_mem = 64MB # Increase for complex sorts/joins; monitor temp file usage
maintenance_work_mem = 1GB # Speeds up VACUUM and index creation
effective_cache_size = 75% of RAM # Helps planner estimate cache hits
# Query Tuning
random_page_cost = 1.1 # Lower for SSDs; encourages index usage
effective_io_concurrency = 200 # For SSDs; improves async I/O
max_parallel_workers_per_gather = 2 # Enable parallel query execution
# Monitoring
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
Index Creation Script Pattern
-- Template for safe index creation in production
-- Use CONCURRENTLY to avoid locking writes
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_table_covering
ON schema.table (filter_col1, filter_col2)
INCLUDE (select_col1, select_col2)
WHERE status = 'active';
-- Verify index usage after creation
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexname = 'idx_table_covering';
Quick Start Guide
- Instrument: Enable
pg_stat_statements (PostgreSQL) or Performance Schema (MySQL) to capture query metrics.
- Identify: Query the statistics view to find the top 5 queries by
total_exec_time or mean_time.
- Analyze: Run
EXPLAIN (ANALYZE, BUFFERS) on the worst query. Note if it uses a Seq Scan, performs a Sort, or does Heap Lookups.
- Optimize:
- If Seq Scan: Add index on filter columns.
- If Heap Lookups: Add
INCLUDE columns to create a covering index.
- If Sort: Ensure index order matches
ORDER BY or increase work_mem.
- Validate: Re-run
EXPLAIN ANALYZE and compare execution time and I/O metrics. Deploy the change and monitor production metrics.