ires partial indexes.
- Sort Requirements: Queries with
ORDER BY can leverage index ordering to avoid expensive sort operations.
2. Index Type Selection
Choose the index structure based on the data type and query operator.
- B-Tree: Default for range queries, equality, and sorting. Use for
=, <, >, BETWEEN, LIKE 'prefix%'.
- Hash: Optimized for equality checks only. Faster than B-Tree for
= but cannot be used for ranges or sorting.
- GIN (Generalized Inverted Index): Essential for JSONB, arrays, and full-text search. Supports containment operators (
@>, &&).
- GiST (Generalized Search Tree): Required for geometric data, range types, and full-text search. Supports overlap and nearest-neighbor searches.
- BRIN (Block Range INdex): Ideal for time-series or naturally sorted data. Stores summaries of block ranges rather than per-row entries. Minimal storage overhead.
3. Composite Index Design Rules
Composite indexes offer the highest ROI but require strict adherence to design rules.
- Left-Prefix Rule: The database can use a composite index
(A, B, C) for queries filtering on A, (A, B), or (A, B, C). It cannot efficiently use it for B or C alone.
- Equality Before Range: Place columns with equality predicates before range predicates.
- Optimal:
CREATE INDEX idx ON table (status, created_at) for WHERE status = 'active' AND created_at > ....
- Suboptimal:
CREATE INDEX idx ON table (created_at, status) forces a scan over all dates then filters status.
- Selectivity Ordering: Within equality columns, order by selectivity descending. Within range columns, the order matters less for filtering but impacts sorting.
4. Covering Indexes and Index-Only Scans
To eliminate heap fetches, create covering indexes that include all columns required by the query.
- Implementation: Use the
INCLUDE clause to add non-key columns to the index leaf pages.
- Benefit: The query executes entirely within the index structure, reducing I/O and CPU usage significantly.
5. Implementation Examples
TypeScript Migration Strategy:
Define indexes declaratively in your migration system to ensure reproducibility and version control.
// src/migrations/20240520_optimize_users.ts
import { Knex } from 'knex';
export async function up(knex: Knex): Promise<void> {
// 1. Composite index for frequent auth lookup
// Strategy: Equality on email (high selectivity), covering for password_hash
await knex.raw(`
CREATE UNIQUE INDEX idx_users_email_covering
ON users (email) INCLUDE (password_hash, updated_at);
`);
// 2. Partial index for active sessions
// Strategy: Avoid indexing inactive rows to reduce bloat and write penalty
await knex.raw(`
CREATE INDEX idx_sessions_active
ON sessions (user_id, expires_at)
WHERE status = 'active';
`);
// 3. BRIN index for time-series logs
// Strategy: Low storage overhead for append-only data
await knex.raw(`
CREATE INDEX idx_logs_timestamp_brin
ON logs USING brin (created_at);
`);
// 4. GIN index for JSONB payload search
// Strategy: Fast containment queries on document store field
await knex.raw(`
CREATE INDEX idx_events_payload_gin
ON events USING gin (payload jsonb_path_ops);
`);
}
export async function down(knex: Knex): Promise<void> {
await knex.raw('DROP INDEX IF EXISTS idx_users_email_covering;');
await knex.raw('DROP INDEX IF EXISTS idx_sessions_active;');
await knex.raw('DROP INDEX IF EXISTS idx_logs_timestamp_brin;');
await knex.raw('DROP INDEX IF EXISTS idx_events_payload_gin;');
}
Query Pattern Validation:
Use TypeScript to wrap query execution with EXPLAIN ANALYZE in development to catch missing indexes.
// src/db/query-analyzer.ts
import { pool } from './connection';
export async function analyzeQuery(sql: string, params: any[]) {
const explainQuery = `EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) ${sql}`;
const result = await pool.query(explainQuery, params);
const plan = result.rows[0]['QUERY PLAN'][0];
// Detect sequential scans on large tables
const hasSeqScan = JSON.stringify(plan).includes('Seq Scan');
if (hasSeqScan) {
console.warn(`[PERF WARNING] Sequential scan detected:\n${JSON.stringify(plan, null, 2)}`);
}
return plan;
}
Pitfall Guide
1. Indexing Low-Cardinality Columns Without Partials
- Mistake: Creating an index on a boolean
is_deleted column. The planner will likely ignore it because the cost of an index scan plus heap fetch exceeds a sequential scan for 50% of rows.
- Fix: Use a partial index:
CREATE INDEX ... WHERE is_deleted = false. This indexes only the relevant subset, making the index small and highly selective.
2. Violating the Left-Prefix Rule
- Mistake: Creating index
(B, A) but querying WHERE A = ? AND B = ?. The database cannot use the index for filtering efficiently.
- Fix: Analyze query predicates and order index columns to match the most common access path. If multiple access paths exist, create separate indexes or a composite that covers the dominant pattern.
3. Write Amplification from Excessive Indexes
- Mistake: Adding an index for every filter condition. Each index adds write overhead (WAL generation, page splits, lock contention).
- Fix: Audit indexes regularly. Remove unused indexes. Consolidate overlapping indexes. Monitor write latency after adding new indexes.
4. Function Calls on Indexed Columns
- Mistake: Querying
WHERE LOWER(email) = 'user@example.com' with an index on email. The function prevents index usage.
- Fix: Use functional indexes:
CREATE INDEX idx_email_lower ON users (LOWER(email)). Alternatively, normalize data at write time to avoid functions in queries.
5. Ignoring Index Bloat
- Mistake: High update/delete activity causes index bloat, where dead tuples occupy space. This degrades cache efficiency and increases I/O.
- Fix: Schedule regular
REINDEX or VACUUM operations. Monitor bloat ratios using system catalogs. Consider pg_repack for zero-downtime maintenance.
6. Indexing Small Tables
- Mistake: Indexing tables with fewer than 100 rows. The overhead of index traversal outweighs the benefit; sequential scan is faster.
- Fix: Exclude small lookup tables from indexing strategies. Let the planner use sequential scans.
7. Missing Statistics
- Mistake: The planner makes poor decisions because table statistics are stale. This often happens after bulk loads or significant data changes.
- Fix: Ensure autovacuum is configured correctly. Run
ANALYZE manually after large data modifications.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-Volume Writes | Minimal indexes; BRIN for time-series; Hash for equality lookups | Reduces write amplification and WAL volume | Low storage, High write throughput |
| JSONB Document Queries | GIN index with jsonb_path_ops operator class | Optimizes containment and existence checks | Moderate storage, Fast document reads |
| Geospatial Filtering | GiST index on geometry/point types | Supports R-Tree structure for spatial ops | Moderate storage, Fast spatial queries |
| Ad-hoc Analytics | Columnar store or materialized views; avoid B-Tree | B-Trees inefficient for full scans/aggregations | Higher compute, Lower query latency |
| Authentication Lookups | Unique B-Tree with INCLUDE for session data | Ensures uniqueness and covers query columns | Low latency, High security |
Configuration Template
PostgreSQL Index Management Policy:
-- 1. Identify Unused Indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- 2. Detect Index Bloat
SELECT
schemaname, tablename, indexname,
real_size,
extra_size,
CASE WHEN extra_size > 0 THEN 'True' ELSE 'False' END AS bloat
FROM (
SELECT
schemaname, tablename, indexname,
pg_relation_size(indexrelid) AS real_size,
(pg_relation_size(indexrelid) - (n_tup_ins * 2)) AS extra_size
FROM pg_stat_user_indexes
JOIN pg_class ON pg_class.oid = pg_stat_user_indexes.indexrelid
WHERE n_tup_ins > 0
) AS bloat_data
WHERE extra_size > 1048576; -- Alert if bloat > 1MB
-- 3. Strategic Index Creation Template
-- Usage: Replace placeholders with actual schema details
CREATE INDEX CONCURRENTLY idx_{table}_{columns}
ON {table} ({column_list})
INCLUDE ({include_columns})
WHERE {partial_condition};
Quick Start Guide
- Identify Bottlenecks: Run
SELECT * FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; to find slow queries.
- Analyze Plan: Prefix the query with
EXPLAIN (ANALYZE, BUFFERS) and check for Seq Scan or high actual rows vs rows discrepancies.
- Create Index: Apply the composite/partial strategy. Use
CREATE INDEX CONCURRENTLY to avoid locking production writes.
- Verify Improvement: Re-run
EXPLAIN to confirm Index Scan usage and reduced cost.
- Monitor: Watch
pg_stat_user_indexes for idx_scan increments and monitor write latency for degradation.
Conclusion
Effective database indexing is an engineering discipline, not a configuration task. It demands a deep understanding of query patterns, data distribution, and storage internals. By adopting a workload-aware strategyâprioritizing composite structures, partial filters, and covering indexesâteams can achieve optimal performance while minimizing resource consumption. Regular auditing and validation are essential to maintain this efficiency as the application evolves.