nsions (e.g., WHERE user_id = ? AND status = ?).
- Sorting requirements.
- Cardinality expectations.
Example: If a dashboard requires SELECT * FROM orders WHERE customer_id = ? ORDER BY created_at DESC, the model must support this without full table scans.
Step 2: Implement Constraints as Code
Database constraints are non-negotiable for data integrity. Never rely solely on application logic. Use schema definitions to enforce:
- Primary Keys: Use sequential integers or UUIDs based on index fragmentation analysis.
- Foreign Keys: Enforce referential integrity with appropriate
ON DELETE actions.
- Check Constraints: Validate data ranges and formats at the storage layer.
- Unique Constraints: Prevent duplicates on business keys.
Step 3: Schema Versioning and Migration Strategy
Treat schema changes as immutable migrations. Never alter production tables directly.
- Generate migrations from schema diffs.
- Ensure migrations are backward-compatible where possible (expand/contract pattern).
- Test migrations against production data volumes.
Step 4: Indexing Strategy
Indexes are part of the data model, not an afterthought.
- Composite Indexes: Align index columns with query
WHERE and ORDER BY clauses.
- Partial Indexes: Index subsets of data to reduce size (e.g.,
WHERE status = 'active').
- Functional Indexes: Index computed values for complex queries.
Code Implementation: TypeScript Schema with Drizzle ORM
This example demonstrates a hybrid model for an e-commerce system, emphasizing constraints, comments, and access-pattern optimization.
import { pgTable, uuid, varchar, timestamp, integer, jsonb, index, check } from 'drizzle-orm/pg-core';
import { sql } from 'drizzle-orm';
// Core Entity: Users
// Normalized structure with strict constraints
export const users = pgTable('users', {
id: uuid('id').defaultRandom().primaryKey(),
email: varchar('email', { length: 255 }).notNull().unique(),
status: varchar('status', { length: 20 }).notNull(),
created_at: timestamp('created_at').defaultNow().notNull(),
}, (table) => {
return {
// Constraint: Status must be a valid enum value
statusCheck: check('users_status_check', sql`${table.status} IN ('active', 'suspended', 'deleted')`),
// Index: Optimizes login lookup
emailIdx: index('users_email_idx').on(table.email),
};
});
// Core Entity: Orders
// Strategic denormalization: includes customer snapshot for dashboard performance
export const orders = pgTable('orders', {
id: uuid('id').defaultRandom().primaryKey(),
user_id: uuid('user_id').notNull().references(() => users.id, { onDelete: 'cascade' }),
// Denormalized fields for access pattern: "List orders for customer dashboard"
// Avoids join to users table for read-heavy operations
customer_email_snapshot: varchar('customer_email_snapshot', { length: 255 }).notNull(),
customer_name_snapshot: varchar('customer_name_snapshot', { length: 255 }).notNull(),
total_amount_cents: integer('total_amount_cents').notNull(),
status: varchar('status', { length: 20 }).notNull(),
metadata: jsonb('metadata').default({}),
created_at: timestamp('created_at').defaultNow().notNull(),
}, (table) => {
return {
// Constraint: Amount must be positive
amountCheck: check('orders_amount_positive', sql`${table.total_amount_cents} > 0`),
// Constraint: Valid status transition logic can be enforced via triggers or app logic,
// but DB constraint ensures finite state set.
statusCheck: check('orders_status_check', sql`${table.status} IN ('pending', 'paid', 'shipped', 'cancelled')`),
// Index: Composite index for "Dashboard" query pattern
// WHERE user_id = ? ORDER BY created_at DESC
userCreatedIdx: index('orders_user_created_idx').on(table.user_id, table.created_at.desc),
// Partial Index: Only index active orders for reporting
activeOrdersIdx: index('orders_active_idx')
.on(table.created_at)
.where(sql`${table.status} != 'cancelled'`),
};
});
// Audit Log: Append-only model for compliance
export const auditLogs = pgTable('audit_logs', {
id: uuid('id').defaultRandom().primaryKey(),
entity_type: varchar('entity_type', { length: 50 }).notNull(),
entity_id: uuid('entity_id').notNull(),
action: varchar('action', { length: 50 }).notNull(),
payload: jsonb('payload').notNull(),
performed_by: uuid('performed_by').references(() => users.id),
created_at: timestamp('created_at').defaultNow().notNull(),
}, (table) => {
return {
// Index: Optimizes "Get audit trail for entity"
entityTrailIdx: index('audit_logs_entity_idx').on(table.entity_type, table.entity_id, table.created_at),
};
});
Architecture Rationale:
users table: Strict normalization. Email is unique and indexed. Status is constrained.
orders table: Hybrid approach. user_id maintains referential integrity. customer_email_snapshot and customer_name_snapshot are denormalized to satisfy the high-frequency dashboard query without a join. Constraints enforce business rules. Composite index supports the dashboard query pattern. Partial index optimizes reporting.
audit_logs table: Append-only design. No updates or deletes. Indexed for retrieval by entity trail.
Pitfall Guide
1. Entity-First Modeling
Mistake: Designing tables based on domain nouns (User, Product, Order) without analyzing how data is accessed.
Impact: Results in excessive joins, N+1 query problems, and inability to scale read workloads.
Best Practice: Start with a "Query Matrix." List every critical query and design tables/indexes to satisfy them. Denormalize only where access patterns demand it.
2. Weak Constraint Enforcement
Mistake: Relying on application code for validation (e.g., checking email format in Node.js) and omitting database constraints.
Impact: Data corruption when multiple services write to the DB, batch jobs bypass validation, or bugs in application logic.
Best Practice: Enforce NOT NULL, UNIQUE, CHECK, and foreign key constraints at the database layer. The DB is the source of truth; constraints are the final gatekeeper.
3. Ignoring Index Fragmentation and Selectivity
Mistake: Adding indexes blindly or using random UUIDs as primary keys on high-write tables without considering B-Tree fragmentation.
Impact: Write performance degradation due to page splits; indexes that are never used by the query planner.
Best Practice: Analyze index usage stats. Use sequential UUIDs or ULIDs for high-write tables to reduce fragmentation. Drop unused indexes. Ensure composite indexes follow the "most selective first" rule unless query patterns dictate otherwise.
4. The JSON Trap
Mistake: Storing structured data in JSONB columns and querying it without generated columns or functional indexes.
Impact: Full table scans on JSON fields; loss of type safety; inability to enforce structure.
Best Practice: If you query JSON fields, create generated columns for those fields and index them. Use JSON only for truly dynamic, unstructured payloads that are rarely queried by internal fields.
5. Migration Anxiety and "Soft" Schema
Mistake: Avoiding schema changes due to fear of downtime, leading to nullable columns, deprecated fields, and "soft" deletions without cleanup.
Impact: Schema bloat, confusion for developers, increased storage costs, and query complexity.
Best Practice: Adopt the "Expand/Contract" pattern. Add new columns as nullable, deploy app code to write to both, backfill data, then switch reads. Use tools that support online schema changes. Regularly audit and remove deprecated columns.
6. Hardcoding Cardinality Assumptions
Mistake: Designing models assuming 1:1 relationships that evolve into 1:N or M:N as business requirements change.
Impact: Schema refactoring becomes necessary; data migration scripts are risky and complex.
Best Practice: Design for flexibility where cardinality is uncertain. Use junction tables for relationships that might become many-to-many. Avoid storing arrays of IDs in a column; use proper relational structures.
7. Neglecting Data Lifecycle and Retention
Mistake: Modeling data as static entities without considering expiration, archival, or compliance requirements.
Impact: Tables grow indefinitely; query performance degrades; compliance violations (GDPR/CCPA) due to inability to purge data.
Best Practice: Implement partitioning for time-series data. Define retention policies. Use soft deletes with deleted_at timestamps for auditability, but ensure archival processes exist. Design models to support "Right to be Forgotten" operations.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High Write, Low Read (e.g., IoT telemetry) | Append-only tables, partitioning by time, minimal indexes | Maximizes write throughput; partitioning aids retention | Higher storage cost; lower compute cost |
| Complex Ad-hoc Queries (e.g., Analytics dashboard) | Star schema or Columnar store (e.g., ClickHouse/BigQuery) | Optimized for aggregation; separates OLAP from OLTP | Higher infra cost; requires ETL pipeline |
| Rapid Iteration / Unstructured Data (e.g., Feature flags) | Document store or JSONB with generated columns | Schema flexibility; fast development | Query limitations; potential integrity risks |
| Strict Compliance / Audit (e.g., Financial ledger) | Immutable append-only logs, cryptographic hashing, strict FKs | Auditability; tamper-evidence; data integrity | Higher storage; complex query patterns |
| High Read, Low Write (e.g., Product catalog) | Heavy denormalization, read replicas, caching | Minimizes latency; reduces DB load | Write complexity; synchronization overhead |
Configuration Template
Drizzle Schema Configuration (schema.ts)
Copy this template to enforce best practices across your project.
import { pgTable, uuid, varchar, timestamp, integer, boolean, index, check } from 'drizzle-orm/pg-core';
import { sql } from 'drizzle-orm';
// Base table configuration for consistency
export const baseColumns = {
id: uuid('id').defaultRandom().primaryKey(),
created_at: timestamp('created_at').defaultNow().notNull(),
updated_at: timestamp('updated_at').defaultNow().notNull(),
};
// Example: Products table with best practices
export const products = pgTable('products', {
...baseColumns,
sku: varchar('sku', { length: 50 }).notNull(),
name: varchar('name', { length: 255 }).notNull(),
price_cents: integer('price_cents').notNull(),
is_active: boolean('is_active').default(true).notNull(),
category_id: uuid('category_id').notNull(),
}, (table) => {
return {
// Business Rules as Constraints
priceCheck: check('products_price_positive', sql`${table.price_cents} >= 0`),
skuUnique: index('products_sku_unique').on(table.sku).unique(),
// Access Pattern: Filter by category and active status
categoryActiveIdx: index('products_category_active_idx')
.on(table.category_id, table.is_active),
};
});
// Migration generation script (drizzle.config.ts)
export default {
schema: "./schema.ts",
out: "./migrations",
dialect: "postgresql",
};
Quick Start Guide
- Initialize Project:
npm install drizzle-orm drizzle-kit
npx drizzle-kit init
- Define Schema: Create
schema.ts using the template. Define tables with constraints and indexes.
- Generate Migration:
npx drizzle-kit generate --name init_schema
- Apply Migration:
npx drizzle-kit migrate
- Verify: Connect to your database and run
\d+ table_name to confirm constraints and indexes are applied. Run a sample query with EXPLAIN ANALYZE to verify index usage.
Conclusion
Data modeling is not a one-time design phase; it is a continuous discipline aligned with evolving access patterns and business requirements. By prioritizing access-pattern-driven design, enforcing constraints at the database layer, and managing schema evolution rigorously, engineering teams can build systems that are performant, maintainable, and resilient. The cost of poor modeling compounds over time; the investment in best practices pays dividends in reduced latency, lower technical debt, and accelerated development velocity.