or token bucket algorithm backed by Redis to enforce per-tenant or per-IP limits. This prevents thundering herds from reaching compute nodes.
import { RateLimiterRedis, RateLimiterMemory } from 'rate-limiter-flexible';
const rateLimiterRedis = new RateLimiterRedis({
storeClient: redisClient,
points: 100, // Max requests per window
duration: 1, // Window in seconds
keyPrefix: 'rl_api'
});
export async function rateLimitMiddleware(req: Request, res: Response, next: NextFunction) {
const key = req.ip || req.headers['x-tenant-id'];
try {
await rateLimiterRedis.consume(key);
next();
} catch (rejRes) {
res.status(429).json({ error: 'Rate limit exceeded', retryAfter: Math.ceil(rejRes.msBeforeNext / 1000) });
}
}
Rationale: In-memory limiters fail on horizontal scaling. Redis-backed distributed limiters ensure consistent enforcement across nodes. The 429 response includes Retry-After to prevent client retry storms.
2. Multi-Layer Caching Strategy
Implement L1 (in-process) and L2 (distributed) caching with stale-while-revalidate semantics. This absorbs read-heavy traffic and eliminates redundant database queries.
import NodeCache from 'node-cache';
import { redisClient } from './redis';
const l1Cache = new NodeCache({ stdTTL: 30, checkperiod: 10 });
export async function getCachedOrFetch<T>(key: string, fetchFn: () => Promise<T>, ttl: number = 300): Promise<T> {
// L1 check
const l1Hit = l1Cache.get<T>(key);
if (l1Hit) return l1Hit;
// L2 check with stale-while-revalidate
const l2Hit = await redisClient.get(key);
if (l2Hit) {
const parsed = JSON.parse(l2Hit);
l1Cache.set(key, parsed, ttl);
return parsed;
}
// Cache miss: fetch and populate
const data = await fetchFn();
await redisClient.set(key, JSON.stringify(data), 'EX', ttl);
l1Cache.set(key, data, ttl);
return data;
}
Rationale: L1 cache eliminates network hops for hot keys. L2 cache shares state across nodes. Stale-while-revalidate is implicit here via TTL; in production, use Redis Streams or Lua scripts to serve stale data while refreshing in the background, preventing cache stampedes.
3. Async Write Boundary
Never block the request cycle on database writes. Route mutations to a message queue. This decouples API latency from persistence performance.
import { Kafka } from 'kafkajs';
const kafka = new Kafka({ clientId: 'api-producer', brokers: ['kafka-1:9092', 'kafka-2:9092'] });
const producer = kafka.producer();
export async function enqueueMutation(event: { type: string; payload: any; idempotencyKey: string }) {
await producer.connect();
await producer.send({
topic: 'api-mutations',
messages: [{ key: event.idempotencyKey, value: JSON.stringify(event) }]
});
return { status: 'accepted', eventId: event.idempotencyKey };
}
Rationale: Kafka partitions by idempotencyKey, ensuring ordering and deduplication. The API returns 202 Accepted immediately. Consumers process writes at database-friendly rates, applying backpressure without rejecting clients.
4. Database Connection Multiplexing & Read/Write Splitting
At 100M scale, connection exhaustion is the primary failure mode. Use PgBouncer or ProxySQL for connection pooling. Route reads to replicas and writes to primary.
import { Pool } from 'pg';
const readPool = new Pool({ connectionString: process.env.DB_READ_URL, max: 50, idleTimeoutMillis: 30000 });
const writePool = new Pool({ connectionString: process.env.DB_WRITE_URL, max: 20, idleTimeoutMillis: 30000 });
export async function executeRead<T>(query: string, params?: any[]): Promise<T[]> {
const client = await readPool.connect();
try {
const res = await client.query<T>(query, params);
return res.rows;
} finally {
client.release();
}
}
Rationale: Separate pools prevent write-heavy transactions from starving read queries. max: 50 aligns with PgBouncer transaction pooling limits. Connection recycling prevents leaks. Always use try/finally to guarantee release.
5. Observability & Circuit Breaking
Inject OpenTelemetry tracing and implement circuit breakers for downstream dependencies. At scale, a single degraded service must not cascade.
import { CircuitBreaker } from 'opossum';
const dbBreaker = new CircuitBreaker(executeRead, {
timeout: 3000,
errorThresholdPercentage: 50,
resetTimeout: 10000,
volumeThreshold: 20
});
dbBreaker.on('open', () => logger.warn('DB circuit open'));
dbBreaker.on('half-open', () => logger.info('DB circuit half-open'));
Rationale: Opossum (Circuit Breaker) fails fast when error rates exceed thresholds, allowing downstream services to recover. Combined with distributed tracing, this isolates bottlenecks before they impact the entire API surface.
Pitfall Guide
-
Synchronous Database Writes Under Load
Blocking the request cycle on INSERT/UPDATE operations ties API latency to disk I/O and lock contention. At 100M scale, write amplification causes queue buildup and timeout cascades. Best practice: Route all mutations through async queues with batched consumers. Return 202 Accepted immediately.
-
Ignoring Connection Pool Exhaustion
Default Node.js/pg connection limits cap at 10-20 per instance. Horizontal scaling multiplies connections linearly, exhausting database max_connections (typically 100-500). Best practice: Use external connection proxies (PgBouncer/ProxySQL) in transaction mode. Pool max should never exceed 50 per node.
-
Cache Stampede & Thundering Herd
When a hot key expires, thousands of requests simultaneously hit the database. This creates a write amplification spike that crashes replicas. Best practice: Implement stale-while-revalidate, mutex locking on cache misses, or probabilistic early expiration. Use Redis Lua scripts to atomically check-and-refresh.
-
Missing Idempotency Keys
Retries, network partitions, and client SDKs duplicate requests. Without idempotency, mutations execute twice, causing data corruption and billing errors. Best practice: Require Idempotency-Key headers. Store processed keys in Redis with TTL. Reject duplicates before processing.
-
Over-Engineering the Read Path
Applying async queues, complex cache hierarchies, and circuit breakers to read endpoints adds latency and operational debt. Reads should be fast, cacheable, and stateless. Best practice: Keep reads synchronous with aggressive caching. Reserve async boundaries for writes and heavy computations.
-
Blind Auto-Scaling Without Backpressure
Cloud auto-scalers react to CPU/memory metrics, not queue depth or error rates. Scaling up failing nodes multiplies the problem. Best practice: Scale based on consumer lag, p99 latency, and rejection rates. Implement backpressure at the ingress layer before provisioning compute.
-
Neglecting Compression & Payload Optimization
Serializing large JSON payloads consumes bandwidth and CPU. At 100M requests, 1KB vs 5KB per response translates to terabytes of unnecessary egress. Best practice: Enable gzip/brotli at the gateway. Use field selection (?fields=id,name,status). Strip metadata from responses.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Read-heavy (>80% GET) | L1/L2 Cache + Read Replicas | Absorbs traffic at edge, eliminates DB load | Low: Cache nodes cost 10-15% of DB compute |
| Write-heavy (>60% POST/PUT) | Async Queue + Batched Consumers | Decouples latency from persistence, enables backpressure | Medium: Queue infrastructure + consumer compute |
| Mixed traffic with bursty patterns | Rate Limiter + Circuit Breakers + Auto-Scaling on Queue Lag | Prevents cascade failures, scales only when needed | Low-Medium: Pay for burst capacity, not idle |
| Multi-tenant SaaS | Tenant-isolated rate limits + Sharded Redis | Prevents noisy neighbor, ensures fair resource allocation | Medium: Per-tenant caching/routing overhead |
Configuration Template
# docker-compose.yml (core infrastructure)
version: '3.8'
services:
redis:
image: redis:7-alpine
command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru --save ""
ports: ["6379:6379"]
deploy:
resources:
limits: { memory: 2.5G }
kafka:
image: confluentinc/cp-kafka:7.5.0
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_NUM_PARTITIONS: 6
depends_on: [zookeeper]
pg-bouncer:
image: edoburu/pgbouncer:1.21.0
environment:
DATABASE_URL: postgres://user:pass@db:5432/app
POOL_MODE: transaction
MAX_CLIENT_CONN: 2000
DEFAULT_POOL_SIZE: 50
ports: ["6432:5432"]
api-gateway:
image: envoyproxy/envoy:v1.28-latest
volumes: ["./envoy.yaml:/etc/envoy/envoy.yaml"]
ports: ["8080:8080"]
# .env.production
REDIS_URL=redis://redis:6379
KAFKA_BROKERS=kafka:9092
DB_READ_URL=postgres://user:pass@pg-bouncer:5432/app?sslmode=disable
DB_WRITE_URL=postgres://user:pass@db-primary:5432/app?sslmode=disable
RATE_LIMIT_POINTS=100
RATE_LIMIT_DURATION=1
CIRCUIT_BREAKER_TIMEOUT=3000
CIRCUIT_BREAKER_THRESHOLD=50
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
Quick Start Guide
- Spin up infrastructure: Run
docker compose up -d redis kafka pg-bouncer. Verify connectivity with redis-cli ping and nc -z kafka 9092.
- Deploy the TS service: Clone the scaffolded API repository. Run
npm install && npm run build && node dist/server.js. The service auto-connects to Redis, Kafka, and PgBouncer using .env variables.
- Validate rate limiting & caching: Send 150 requests/second using
wrk -t4 -c100 -d10s http://localhost:8080/api/v1/health. Observe 429 responses after 100 RPS. Verify L2 cache hits in Redis with redis-cli monitor.
- Test async write path: POST to
/api/v1/events with an Idempotency-Key. Confirm 202 response. Check Kafka consumer logs for batched inserts. Verify database writes complete within 2 seconds.
- Enable observability: Access the OpenTelemetry collector dashboard. Confirm traces show <50ms for cached reads, <200ms for async writes, and circuit breaker state transitions during simulated failures.