a. It defines the partition key and strategy.
CREATE TABLE events (
id BIGSERIAL,
tenant_id UUID NOT NULL,
occurred_at TIMESTAMPTZ NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY RANGE (occurred_at);
Step 3: Create Partitions
Manual creation is error-prone at scale. Use native automatic partitioning (PostgreSQL 11+) or a management extension.
-- PostgreSQL 11+ range partitions
CREATE TABLE events_2024_q1 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE events_2024_q2 PARTITION OF events
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
For production, automate partition creation. PostgreSQL supports pg_partman or declarative background workers. Range partitions should be created ahead of time (typically 2β4 quarters) to prevent write failures on missing partitions.
Step 4: Align Indexes and Constraints
Indexes must exist on each partition. PostgreSQL propagates index definitions from the parent, but you can optimize per-partition.
CREATE INDEX idx_events_tenant_occurred ON events (tenant_id, occurred_at DESC);
Constraints like PRIMARY KEY or UNIQUE must include the partition key. This is a hard requirement in most RDBMS engines to guarantee uniqueness within a partition scope.
Step 5: Query Routing & ORM Integration
The query planner prunes partitions when the WHERE clause contains partition key predicates. Without it, the planner scans all partitions.
// Node.js / pg example demonstrating pruning
import { Pool } from 'pg';
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
// Pruning enabled: planner skips irrelevant partitions
const prunedQuery = `
EXPLAIN ANALYZE
SELECT payload FROM events
WHERE tenant_id = $1 AND occurred_at >= $2 AND occurred_at < $3
`;
// Full scan: planner touches all partitions
const fullScanQuery = `
EXPLAIN ANALYZE
SELECT payload FROM events WHERE tenant_id = $1
`;
await pool.query(prunedQuery, [tenantId, start, end]);
ORMs like Prisma, TypeORM, or Drizzle do not automatically rewrite queries for pruning. You must ensure partition key predicates are included in every targeted query. For TypeScript backends, wrap database access in a repository layer that enforces partition key inclusion.
Architecture Decisions & Rationale
- Why range for time-series? Temporal access patterns dominate backend workloads. Range partitioning aligns with retention policies, enabling fast
DROP PARTITION for data expiration instead of expensive DELETE operations.
- Why hash for write-heavy tables? Hash distribution eliminates hotspots. It is ideal for high-throughput event ingestion where queries rarely filter by time.
- Why not partition everything? Partitioning adds planner overhead. Tables under 50M rows rarely benefit. The cost of managing hundreds of partitions outweighs I/O savings.
- Storage vs Compute: Partitioning optimizes compute (CPU/I/O). It does not reduce storage footprint. Compression, columnar storage, or tiered storage handle size reduction.
Pitfall Guide
1. Partitioning on Low-Cardinality or High-Churn Columns
Partition keys with few distinct values (e.g., status, is_active) create uneven partitions. High-churn columns cause frequent row migrations between partitions, triggering dead tuples and write amplification.
Best Practice: Use columns with high cardinality and stable access patterns. Avoid boolean or enum flags unless combined with a high-cardinality prefix.
2. Ignoring Partition Pruning in Query Design
Queries missing partition key predicates force sequential scans across all child tables. This degrades performance worse than an unpartitioned table due to planner overhead.
Best Practice: Always include partition key ranges in WHERE clauses. Use EXPLAIN to verify Append nodes are pruned. Enforce this in code reviews and repository layers.
3. Misaligned Indexes Across Partitions
Indexes defined only on specific partitions break query consistency. The planner may skip partitions with missing indexes or fall back to sequential scans.
Best Practice: Define indexes on the parent table. Verify partition inheritance propagates them. Monitor pg_stat_user_indexes to detect missing or unused indexes per partition.
4. Over-Partitioning
Creating daily partitions for a table with 100k rows/day generates thousands of child tables. The planner's metadata overhead increases, connection pooling suffers, and VACUUM cycles multiply.
Best Practice: Match partition granularity to query windows. Monthly or quarterly partitions balance I/O reduction with metadata overhead. Use sub-partitioning only when necessary.
5. Neglecting Maintenance & Statistics
Partitioned tables require updated statistics per partition. Stale stats cause poor query plans. Dead tuples accumulate faster in high-write partitions.
Best Practice: Schedule ANALYZE per partition. Use pg_partman or background workers for automatic maintenance. Monitor n_dead_tup and last_autovacuum metrics.
6. Assuming Partitioning Solves Concurrency Bottlenecks
Partitioning distributes storage, not locks. High-write tables still contend on sequence generators, constraint checks, and WAL writes.
Best Practice: Use GENERATED ALWAYS AS IDENTITY with caching. Batch inserts. Consider unlogged tables for ephemeral data. Partitioning complements, not replaces, write optimization.
7. Forgetting Cross-Partition Aggregations
COUNT(), SUM(), or GROUP BY across partitions trigger parallel scans. Without proper work_mem and parallel query settings, aggregation becomes a bottleneck.
Best Practice: Pre-aggregate in materialized views. Use partition-aware query routing. Tune max_parallel_workers_per_gather and work_mem for analytical workloads.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Time-series telemetry (>1B rows) | Range partitioning by month | Aligns with retention policies; enables fast DROP PARTITION; pruning reduces scan I/O by 70%+ | Storage unchanged; compute costs drop 40-60% |
| Multi-tenant SaaS with isolated queries | List partitioning by tenant_id | Guarantees data isolation; simplifies backup/restore per tenant; planner prunes to single partition | Slight overhead for tenant routing; eliminates cross-tenant scan costs |
| High-write event ingestion | Hash partitioning (8-16 buckets) | Eliminates write hotspots; distributes WAL and lock contention evenly | Higher index maintenance cost; write latency improves 30-50% |
| Complex analytical joins across entities | No partitioning + columnar warehouse | Relational partitioning degrades cross-table joins; analytical workloads require MPP architecture | Migration cost to warehouse; query latency drops 10-100x for analytics |
Configuration Template
PostgreSQL declarative range partitioning with automatic creation via pg_partman (production-ready baseline):
-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_partman;
-- Create parent table
CREATE TABLE telemetry_data (
id BIGSERIAL,
device_id UUID NOT NULL,
recorded_at TIMESTAMPTZ NOT NULL,
metrics JSONB NOT NULL,
PRIMARY KEY (id, recorded_at)
) PARTITION BY RANGE (recorded_at);
-- Configure pg_partman for monthly partitions
SELECT partman.create_parent(
p_parent_table := 'public.telemetry_data',
p_control := 'recorded_at',
p_type := 'range',
p_interval := '1 month',
p_premake := 3
);
-- Create indexes on parent (propagates automatically)
CREATE INDEX idx_telemetry_device_time ON telemetry_data (device_id, recorded_at DESC);
CREATE INDEX idx_telemetry_metrics_gin ON telemetry_data USING gin (metrics);
-- Background worker setup (add to postgresql.conf)
-- shared_preload_libraries = 'pg_partman_bgw'
-- pg_partman_bgw.interval = 3600
-- pg_partman_bgw.dbname = 'your_db'
-- pg_partman_bgw.role = 'postgres'
TypeScript repository guard enforcing pruning:
import { z } from 'zod';
import { db } from './db';
const PartitionedQuerySchema = z.object({
tenantId: z.string().uuid(),
timeRange: z.object({
start: z.coerce.date(),
end: z.coerce.date(),
}),
});
export async function getTelemetry(params: z.infer<typeof PartitionedQuerySchema>) {
const validated = PartitionedQuerySchema.parse(params);
// Enforce partition key inclusion at runtime
if (!validated.timeRange.start || !validated.timeRange.end) {
throw new Error('Partition key bounds required to prevent full scan');
}
return db.query(`
SELECT device_id, metrics, recorded_at
FROM telemetry_data
WHERE device_id = $1 AND recorded_at >= $2 AND recorded_at < $3
ORDER BY recorded_at DESC
LIMIT 1000
`, [validated.tenantId, validated.timeRange.start, validated.timeRange.end]);
}
Quick Start Guide
- Identify partition key: Run
EXPLAIN ANALYZE on your top 5 slowest queries. Extract columns used in WHERE clauses with range or equality filters. Select the column with highest cardinality and temporal/categorical stability.
- Create parent table: Execute
CREATE TABLE ... PARTITION BY RANGE/LIST/HASH with your chosen key. Include the key in all PRIMARY KEY and UNIQUE constraints.
- Generate initial partitions: Use
pg_partman or manual CREATE TABLE ... PARTITION OF statements. Create at least 2β4 future partitions to prevent write failures.
- Validate pruning: Run
EXPLAIN on a targeted query. Confirm the plan shows Append with pruned partitions (check Partitions Pruned: X). Add missing partition key predicates if pruning fails.
- Deploy monitoring: Log
pg_stat_user_tables and pg_stat_user_indexes per partition. Alert on n_dead_tup > 100000 or last_autovacuum > 24h. Schedule ANALYZE cron jobs or enable background workers.
Partitioning is a storage layout decision, not a scaling magic wand. Align it with access patterns, enforce pruning at the application layer, and automate lifecycle management. The performance gains compound when the database engine stops scanning irrelevant data blocks.