act violations, they become a signal rather than noise. Monitoring systems can aggregate them, distributed tracing can correlate them, and engineering teams can build automated recovery policies around them. The result is a system that fails predictably, fails visibly, and fails recoverably.
Core Solution
Building resilient error boundaries requires a deliberate separation of concerns. The solution rests on four architectural decisions: defining failure contracts, creating an error taxonomy, establishing propagation boundaries, and centralizing response mapping.
Step 1: Define Failure Contracts
Every public function must declare what constitutes success and what constitutes failure. This is not about return types alone; it is about semantic intent. If a function promises to retrieve a resource by a known identifier, a missing resource is a contract violation. If a function promises to validate user input, a validation failure is an expected business state. The contract dictates the propagation mechanism.
Step 2: Create an Error Taxonomy
Custom error classes replace magic strings and boolean flags. They carry structured metadata, preserve stack traces, and enable precise catch filtering. In TypeScript, this looks like a base error class extended by domain-specific failures.
// Base contract violation error
export class AppError extends Error {
public readonly statusCode: number;
public readonly errorCode: string;
public readonly isOperational: boolean;
constructor(message: string, statusCode: number, errorCode: string, isOperational = true) {
super(message);
this.name = this.constructor.name;
this.statusCode = statusCode;
this.errorCode = errorCode;
this.isOperational = isOperational;
Error.captureStackTrace(this, this.constructor);
}
}
// Domain-specific failures
export class ResourceNotFoundError extends AppError {
constructor(resource: string, id: string) {
super(`${resource} not found: ${id}`, 404, 'RESOURCE_MISSING');
}
}
export class ExternalServiceTimeoutError extends AppError {
constructor(service: string, timeoutMs: number) {
super(`${service} timed out after ${timeoutMs}ms`, 504, 'EXTERNAL_TIMEOUT');
}
}
Step 3: Implement Propagation Boundaries
Intermediate layers should not catch errors they cannot resolve. They should allow exceptions to bubble upward until they reach a layer with sufficient context to act. This is typically the request handler, message consumer, or background job orchestrator.
// Service layer: focuses on business logic, throws on contract violation
export class OrderService {
constructor(private readonly inventoryRepo: InventoryRepository) {}
async reserveStock(orderId: string, items: OrderItem[]): Promise<void> {
const order = await this.orderRepo.findById(orderId);
if (!order) {
throw new ResourceNotFoundError('Order', orderId);
}
const reservation = await this.inventoryRepo.reserve(items);
if (!reservation.success) {
throw new InsufficientInventoryError(reservation.missingSkus);
}
}
}
// Controller layer: establishes the boundary, catches and translates
export class OrderController {
constructor(private readonly orderService: OrderService) {}
async handleReserveStock(req: Request, res: Response): Promise<void> {
try {
await this.orderService.reserveStock(req.body.orderId, req.body.items);
res.status(200).json({ status: 'reserved' });
} catch (err) {
if (err instanceof AppError) {
res.status(err.statusCode).json({
code: err.errorCode,
message: err.message
});
} else {
// Unexpected failure
res.status(500).json({ code: 'INTERNAL_ERROR', message: 'Unexpected failure' });
}
}
}
}
Step 4: Centralize Response Mapping
All external-facing boundaries should funnel errors through a unified translation layer. This layer handles structured logging, correlation ID injection, retry policy evaluation, and response formatting. By centralizing this logic, you eliminate duplicated error handling code and ensure consistent observability signals.
Architecture Rationale:
- Custom classes over strings: Enable precise type checking, preserve stack traces, and carry metadata without polluting function signatures.
- Boundaries at the edge: Controllers, gateways, and job runners possess HTTP context, user identity, and retry infrastructure. Intermediate services do not.
- Separation of expected vs exceptional: Validation failures and missing optional records return domain types. Infrastructure failures, contract violations, and unrecoverable states throw. This keeps exception volume low and meaningful.
Pitfall Guide
1. Silent Exception Swallowing
Explanation: Catching an error without logging, re-throwing, or handling it masks failures. The application continues in an undefined state, often corrupting data or producing incorrect outputs.
Fix: Always log structured error details before swallowing. If the error cannot be handled locally, re-throw or wrap it in a domain-specific error.
2. Using Exceptions for Control Flow
Explanation: Throwing exceptions for expected business outcomes (e.g., form validation, optional lookups) degrades performance and obscures intent. Exception unwinding is computationally expensive compared to conditional branching.
Fix: Reserve exceptions for contract violations and infrastructure failures. Return domain result types (Result<T, E>, Option<T>, or explicit status objects) for expected outcomes.
3. Returning null or undefined for Fatal Failures
Explanation: Sentinel values force every caller to perform defensive checks. When a fatal failure occurs, returning null silently propagates ambiguity up the stack until it crashes in an unrelated module.
Fix: Throw immediately when a function cannot fulfill its contract. Let the boundary layer decide how to surface the failure.
4. Catching and Re-throwing Without Preserving Context
Explanation: Creating a new error inside a catch block without chaining the original stack trace destroys debugging context. Production incidents become impossible to trace.
Fix: Use cause property (ES2022+) or custom error wrapping that preserves the original stack. Never discard the originating exception.
5. Mixing Propagation Strategies in One Module
Explanation: Some functions return error tuples while others throw exceptions. Callers must implement dual handling logic, increasing cognitive load and bug probability.
Fix: Enforce a single propagation strategy per module or bounded context. Document the contract explicitly in API definitions or JSDoc/TSDoc.
6. Terminating the Process in Shared Libraries
Explanation: Calling process.exit(), die(), or equivalent in a library or service module kills the entire runtime. Long-running servers, background workers, and multi-tenant applications cannot recover.
Fix: Libraries should never terminate the host process. Throw structured errors and let the application boundary decide on shutdown, retry, or degradation.
7. Over-Catching with Broad catch (err)
Explanation: Catching all errors indiscriminately handles both operational failures and programming bugs (e.g., TypeError, ReferenceError) identically. This masks developer mistakes and delays fixes.
Fix: Catch specific error classes. Allow programming errors to bubble up to global handlers or crash the process for immediate visibility.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| User submits invalid form data | Return validation result object | Expected business state; caller can display field-level errors | Low (standard branching) |
| Database connection drops during transaction | Throw DatabaseConnectionError | Contract violation; caller cannot proceed without DB | Medium (requires retry/degradation logic) |
| Optional profile lookup returns empty | Return null or Option type | Expected outcome; business logic handles absence gracefully | Low (no exception overhead) |
| Third-party payment gateway times out | Throw ExternalServiceTimeoutError | Infrastructure failure; requires retry or fallback | Medium-High (depends on retry policy & SLA) |
| Internal invariant broken (e.g., negative balance) | Throw InvariantViolationError | Programming error or data corruption; requires immediate visibility | High (may trigger alerting & rollback) |
Configuration Template
// error-boundary.ts
import { Request, Response, NextFunction } from 'express';
import { AppError } from './errors';
import { logger } from './observability';
export function globalErrorHandler(
err: Error,
_req: Request,
res: Response,
_next: NextFunction
): void {
const isOperational = err instanceof AppError && err.isOperational;
// Structured logging with correlation context
logger.error({
message: err.message,
stack: err.stack,
errorCode: isOperational ? (err as AppError).errorCode : 'UNKNOWN',
correlationId: res.locals.correlationId,
isOperational
});
// Response mapping
if (isOperational) {
const appErr = err as AppError;
res.status(appErr.statusCode).json({
code: appErr.errorCode,
message: appErr.message,
correlationId: res.locals.correlationId
});
} else {
// Programming errors: hide details, log fully
res.status(500).json({
code: 'INTERNAL_ERROR',
message: 'An unexpected error occurred',
correlationId: res.locals.correlationId
});
}
}
// Usage in Express app
app.use(globalErrorHandler);
Quick Start Guide
- Initialize error taxonomy: Create
src/errors/base.ts with a base AppError class that captures stack traces and carries statusCode, errorCode, and isOperational flags.
- Define domain failures: Extend the base class for each contract violation your services encounter (e.g.,
ResourceNotFoundError, PaymentDeclinedError).
- Place boundaries: Wrap controller/handler logic in
try/catch blocks. Route AppError instances to structured responses; route unknown errors to generic 500 responses with full logging.
- Inject observability: Add correlation ID middleware to requests. Ensure every error log includes the ID, error code, and operational flag for downstream tracing.
- Validate with tests: Write integration tests that force infrastructure failures (mock timeouts, DB drops) and verify that boundaries return correct HTTP status codes and preserve correlation IDs.
Error handling is not a syntax exercise. It is an architectural discipline. When you treat failures as first-class citizens, define clear boundaries, and preserve execution context, your application stops hiding problems and starts communicating them. That shift transforms debugging from a reactive hunt into a proactive observability pipeline.