ze Query Patterns and Selectivity
Before creating an index, analyze the query workload to determine selectivity and cardinality. Selectivity is the ratio of distinct values to total rows. Low selectivity columns (e.g., boolean flags) rarely benefit from B-Tree indexes unless used in partial indexes.
Use TypeScript to automate the analysis of query statistics from PostgreSQL's pg_stat_statements:
import { Client } from 'pg';
interface QueryStats {
query: string;
calls: number;
mean_time: number;
rows: number;
}
async function analyzeIndexCandidates(client: Client): Promise<void> {
const result = await client.query<QueryStats>(`
SELECT query, calls, mean_time, rows
FROM pg_stat_statements
WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database())
ORDER BY mean_time * calls DESC
LIMIT 10;
`);
for (const row of result.rows) {
console.log(`High Impact Query: ${row.query.substring(0, 50)}...`);
console.log(` Calls: ${row.calls}, Avg Time: ${row.mean_time.toFixed(2)}ms`);
// Heuristic: Flag queries with high mean_time and low row return count
// as candidates for index optimization.
}
}
Step 2: Design Composite Indexes with Leftmost Prefix Rule
Composite indexes must follow the leftmost prefix rule. The index is sorted by the first column, then the second, etc. Queries filtering on a subset of columns can only use the index if they match the leftmost columns.
Architecture Decision: Place equality columns before range columns.
- Rationale: B-Tree traversal can efficiently skip branches for equality predicates. Range predicates consume the sort order, preventing the use of subsequent columns for filtering.
- Example:
CREATE INDEX idx ON orders (customer_id, status, created_at);
WHERE customer_id = 1 AND status = 'active' → Uses index.
WHERE customer_id = 1 AND created_at > '2023-01-01' → Uses index.
WHERE status = 'active' → Cannot use index (skips customer_id).
Step 3: Implement Covering Indexes for Read-Heavy Paths
For queries that frequently access the same set of columns, use covering indexes to enable index-only scans. This eliminates the need to fetch tuples from the heap, reducing random I/O.
- PostgreSQL Implementation: Use the
INCLUDE clause.
- MySQL Implementation: Ensure all selected columns are part of the index key.
-- Covering index for frequent dashboard query
-- Avoids heap fetch for total_amount and created_at
CREATE INDEX idx_orders_dashboard
ON orders (customer_id, status)
INCLUDE (total_amount, created_at);
Step 4: Tune Fillfactor for Write-Heavy Tables
The fillfactor parameter controls how full index pages are packed during creation and updates. The default is 100%. For tables with frequent updates, setting a lower fillfactor reserves space on pages for HOT (Heap-Only Tuple) updates, reducing page splits and index bloat.
- Rationale: HOT updates allow row versions to be stored on the same heap page if no indexed columns change. If the page is full, a page split occurs, requiring index updates.
- Configuration:
WITH (fillfactor = 70) reserves 30% free space.
Step 5: Verify with EXPLAIN ANALYZE
Always validate index usage using EXPLAIN ANALYZE. Look for Index Scan or Index Only Scan. If the plan shows Seq Scan, check for implicit type casts or selectivity thresholds that discourage the optimizer.
EXPLAIN ANALYZE
SELECT total_amount, created_at
FROM orders
WHERE customer_id = 12345 AND status = 'active';
Pitfall Guide
1. Indexing Low Cardinality Columns
Adding B-Tree indexes on columns with few distinct values (e.g., is_deleted, status with 3 values) increases storage and write cost without improving read performance. The optimizer often prefers a sequential scan due to the high cost of fetching heap pages for a large percentage of rows.
2. Implicit Type Casts
Comparing a column to a value of a different type forces the database to cast values, often preventing index usage.
- Example:
WHERE varchar_column = 123 triggers a cast on every row, causing a full scan.
- Best Practice: Ensure literal types match column definitions. Use explicit casts if necessary, but prefer schema alignment.
3. Ignoring Write Amplification
Every index adds overhead to INSERT, UPDATE, and DELETE operations. The database must update all relevant index structures, leading to write amplification.
- Best Practice: Audit indexes regularly. Drop unused indexes. Monitor write latency impact using
pg_stat_user_indexes and pg_stat_user_tables.
4. Wrong Column Order in Composite Indexes
Placing range columns before equality columns in a composite index reduces effectiveness.
- Mistake:
CREATE INDEX idx ON orders (created_at, customer_id);
- Impact: Query
WHERE customer_id = 1 AND created_at > '...' cannot efficiently use customer_id for filtering after the range scan on created_at.
- Best Practice: Order columns by equality predicates first, then range/sort predicates.
5. Index Bloat Neglect
Frequent updates and deletes leave dead tuples in index pages. Without maintenance, indexes grow significantly larger than necessary, consuming memory and I/O.
- Best Practice: Configure autovacuum thresholds aggressively for volatile tables. Monitor bloat using
pgstattuple extension.
6. Over-Reliance on ORM Defaults
ORMs often generate indexes with default options that may not suit production workloads.
- Best Practice: Review generated migrations. Customize
fillfactor, include columns, and partial conditions manually for critical paths.
7. Assuming Indexes Always Speed Up ORDER BY
Indexes can eliminate sorting operations only if the query's ORDER BY matches the index order and direction.
- Mistake: Index
(a ASC, b ASC), Query ORDER BY a DESC, b ASC.
- Impact: The optimizer may still perform a sort.
- Best Practice: Create indexes with mixed directions if supported, or ensure query sort order matches index definition.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-Frequency Time-Series Writes | LSM-Tree Structure | Sequential writes minimize I/O; compaction handles merges. | Higher storage cost; lower write latency. |
| OLTP with Frequent Updates | B-Tree with fillfactor=70 | Supports random reads/writes; fillfactor reduces page splits. | Balanced read/write; moderate storage. |
| Read-Heavy Analytics on Subset | Partial Index | Index size reduced to relevant rows; faster scans. | Low storage; high read performance. |
| Exact Match Lookups Only | Hash Index | O(1) access time; minimal overhead. | No range support; memory intensive. |
| Multi-Column Filter + Sort | Composite Index (Filter cols, Sort col) | Eliminates sort operation; efficient filtering. | Moderate write cost; optimal read. |
| High Read Volume, Low Write | Covering Index | Avoids heap fetch; index-only scan. | High storage; fastest read latency. |
Configuration Template
Ready-to-use PostgreSQL index configuration for a high-performance OLTP order table.
-- 1. Primary Covering Index for Customer Dashboard
-- Includes non-key columns to avoid heap fetch
CREATE INDEX idx_orders_customer_dashboard
ON orders (customer_id, status)
INCLUDE (total_amount, created_at, shipping_address)
WITH (fillfactor = 80)
WHERE status != 'archived';
-- 2. Partial Index for Pending Payments
-- Targets low cardinality subset efficiently
CREATE INDEX idx_orders_pending_payments
ON orders (payment_due_date)
WHERE status = 'pending_payment';
-- 3. Unique Index with Constraint
-- Enforces uniqueness while supporting fast lookups
CREATE UNIQUE INDEX idx_orders_external_id
ON orders (external_transaction_id);
-- 4. Expression Index for Case-Insensitive Search
CREATE INDEX idx_orders_email_lower
ON orders (LOWER(email));
TypeScript Migration Helper:
// migration-helper.ts
import { Pool } from 'pg';
export async function applyOptimizedIndexes(pool: Pool): Promise<void> {
const client = await pool.connect();
try {
await client.query('BEGIN');
// Apply cover index
await client.query(`
CREATE INDEX IF NOT EXISTS idx_orders_customer_dashboard
ON orders (customer_id, status)
INCLUDE (total_amount, created_at)
WITH (fillfactor = 80)
WHERE status != 'archived';
`);
// Apply partial index
await client.query(`
CREATE INDEX IF NOT EXISTS idx_orders_pending_payments
ON orders (payment_due_date)
WHERE status = 'pending_payment';
`);
await client.query('COMMIT');
console.log('Indexes applied successfully.');
} catch (error) {
await client.query('ROLLBACK');
console.error('Index application failed:', error);
throw error;
} finally {
client.release();
}
}
Quick Start Guide
- Identify Candidates: Run the following query to find tables with high sequential scans and low index usage:
SELECT relname, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan * 10
ORDER BY seq_scan DESC;
- Analyze Slow Queries: Use
pg_stat_statements to find queries with high mean execution time on the identified tables.
- Create Index: Based on the query predicates, create a composite index following the leftmost prefix rule. Use
fillfactor for write-heavy tables.
- Verify: Execute
EXPLAIN ANALYZE on the slow query. Confirm the plan uses Index Scan or Index Only Scan and check the actual execution time reduction.
- Monitor: Set up an alert for index bloat and unused indexes using the checklist metrics. Review weekly.