enTelemetry semantic conventions or a custom JSON schema that includes:
event_id: Unique UUID for the event.
timestamp: ISO 8601 with timezone.
severity: Enum (e.g., CRITICAL, HIGH, MEDIUM, LOW).
event_type: Categorized action (e.g., auth.login_failure, auth.privilege_escalation, data.access_sensitive).
actor: User ID, service account, or IP.
target: Resource identifier.
correlation_id: Trace ID linking the event across services.
risk_score: Computed risk level based on heuristics.
Step 2: Implement Structured Logger with Sanitization
Use a structured logging library. In TypeScript/Node.js, winston or pino are industry standards. Implement a wrapper that enforces schema compliance and sanitizes sensitive data.
import winston from 'winston';
import { v4 as uuidv4 } from 'uuid';
// Sanitization regex to prevent PII/Secret leakage
const SANITIZATION_RULES = [
{ regex: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, replacement: '[EMAIL_REDACTED]' },
{ regex: /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g, replacement: '[CC_REDACTED]' },
{ regex: /password["\s]*[:=]["\s]*\S+/gi, replacement: 'password=[REDACTED]' }
];
function sanitize(obj: any): any {
if (typeof obj === 'string') {
return SANITIZATION_RULES.reduce((acc, rule) => acc.replace(rule.regex, rule.replacement), obj);
}
if (typeof obj === 'object' && obj !== null) {
return Object.fromEntries(
Object.entries(obj).map(([key, value]) => [key, sanitize(value)])
);
}
return obj;
}
const logger = winston.createLogger({
level: 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: { service: 'auth-service' },
transports: [
new winston.transports.Console(),
// Add secure transport to centralized aggregation
]
});
export function logSecurityEvent(
eventType: string,
actor: string,
target: string,
metadata: Record<string, any>,
riskScore: number,
correlationId: string
) {
const event = {
event_id: uuidv4(),
event_type: eventType,
severity: riskScore > 8 ? 'CRITICAL' : riskScore > 5 ? 'HIGH' : 'MEDIUM',
actor,
target,
risk_score: riskScore,
correlation_id: correlationId,
...sanitize(metadata)
};
logger.log({
level: 'security',
message: `Security event: ${eventType}`,
...event
});
}
Step 3: Middleware Integration for Context Injection
Security events must be captured at request boundaries. Implement middleware that extracts context and logs critical actions.
import { Request, Response, NextFunction } from 'express';
export function securityLoggerMiddleware(req: Request, res: Response, next: NextFunction) {
// Generate or extract correlation ID
const correlationId = req.headers['x-correlation-id'] as string || uuidv4();
req.correlationId = correlationId;
// Track request start time for latency monitoring
const startTime = Date.now();
res.on('finish', () => {
const duration = Date.now() - startTime;
// Log 4xx/5xx responses as potential security probes
if (res.statusCode >= 400) {
logSecurityEvent(
'http.error_response',
req.ip || 'unknown',
req.path,
{ method: req.method, statusCode: res.statusCode, duration },
res.statusCode === 401 || res.statusCode === 403 ? 6 : 3,
correlationId
);
}
});
next();
}
Step 4: Architecture Decisions
- Centralized Aggregation: Route all logs to a centralized system (e.g., ELK, Splunk, Datadog, or cloud-native equivalents). Local logs are insufficient for cross-service correlation.
- Immutability: Security logs must be stored in Write-Once-Read-Many (WORM) storage or append-only buckets to prevent attackers from covering their tracks by modifying logs.
- Separation of Duties: Access to security logs should be restricted. Application service accounts should only have write access; read access is reserved for security operators.
- Correlation IDs: Enforce correlation IDs across all services to enable distributed tracing of attack chains.
Step 5: Alerting and Triage
Configure alerting rules based on structured fields, not text parsing.
- Thresholding: Alert on frequency (e.g., >5 failed logins from same IP in 60 seconds).
- Anomaly Detection: Use statistical baselines to detect deviations in access patterns.
- Risk Scoring: Aggregate risk scores over a time window. If a user's cumulative risk score exceeds a threshold, trigger an automated response (e.g., session revocation).
Pitfall Guide
-
Logging PII and Secrets:
Developers often log request bodies or headers containing passwords, tokens, or PII. This violates GDPR/PCI-DSS and creates a liability.
- Best Practice: Implement strict sanitization at the logging layer. Never log raw request payloads. Use allow-lists for logged fields.
-
Log Injection Attacks:
Attackers can inject CRLF characters or JSON control characters into log fields to forge log entries or break parsers.
- Best Practice: Use structured loggers that handle encoding automatically. Validate and escape inputs before logging. Treat log inputs as untrusted data.
-
Ignoring Performance Impact:
Synchronous logging of high-volume events can block the event loop or saturate I/O, causing denial of service.
- Best Practice: Use asynchronous, buffered logging. Implement rate limiting for log generation. Drop non-critical logs under load rather than blocking the application.
-
Lack of Correlation IDs:
Without correlation IDs, it is impossible to trace an attack across microservices. Security teams are left with fragmented logs that require manual reconstruction.
- Best Practice: Propagate correlation IDs via HTTP headers (e.g.,
X-Correlation-ID) and context objects in all service calls.
-
Alert Fatigue from Low-Fidelity Rules:
Alerting on every 404 or generic error generates noise that desensitizes operators.
- Best Practice: Tune alerts to high-signal events. Use risk scoring and aggregation. Implement tiered alerting where low-risk events are aggregated into daily reports rather than immediate pagers.
-
Insufficient Retention and Integrity:
Logs rotated too quickly lose forensic value. Logs stored without integrity checks can be tampered with.
- Best Practice: Define retention policies based on compliance requirements (e.g., 1 year for security logs). Use cryptographic hashing or WORM storage to ensure log integrity.
-
Over-Logging vs. Under-Logging:
Logging everything increases storage costs and obscures signals; logging too little leaves gaps.
- Best Practice: Maintain a security event taxonomy. Log all authentication, authorization, data access, and configuration changes. Exclude routine health checks and successful reads of non-sensitive data unless required by compliance.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small SaaS / Startup | Cloud-native logging (CloudWatch/Datadog) | Low operational overhead, managed scaling, integrated alerting. | $$ (Pay-as-you-go) |
| Regulated Finance / Healthcare | On-prem SIEM + WORM Storage | Strict compliance, data sovereignty, immutable audit trails required. | $$$$ (High CapEx/OpEx) |
| High-Throughput API Gateway | eBPF + Stream Processing (Kafka/Flink) | Minimal latency impact, high throughput, kernel-level visibility. | $$ (Infrastructure cost) |
| Multi-Cloud / Hybrid | OpenTelemetry Collector + Central Aggregator | Vendor neutrality, consistent instrumentation across environments. | $$$ (Complexity management) |
Configuration Template
Winston Configuration with Security Hardening:
import winston from 'winston';
import { v4 as uuidv4 } from 'uuid';
// Custom formatter for security events
const securityFormatter = winston.format((info) => {
if (info.level === 'security') {
info.event_id = info.event_id || uuidv4();
info.timestamp = new Date().toISOString();
// Ensure critical fields exist
if (!info.correlation_id) {
info.correlation_id = 'MISSING_CORRELATION_ID';
}
}
return info;
});
const securityLogger = winston.createLogger({
level: 'security',
format: winston.format.combine(
securityFormatter(),
winston.format.errors({ stack: true }),
winston.format.json()
),
transports: [
new winston.transports.File({
filename: 'logs/security.log',
maxsize: 5242880, // 5MB
maxFiles: 5,
tailable: true
}),
// Add secure transport for remote aggregation
]
});
export default securityLogger;
Prometheus Alert Rule Example:
groups:
- name: security_alerts
rules:
- alert: HighAuthFailureRate
expr: rate(auth_login_failures_total[5m]) > 10
for: 2m
labels:
severity: high
annotations:
summary: "High authentication failure rate detected"
description: "Rate of auth failures is {{ $value }} per second for {{ $labels.service }}."
Quick Start Guide
- Initialize Logging Library: Install
winston and uuid. Create a securityLogger.ts file with the configuration template above.
- Add Sanitization: Copy the
sanitize function and regex rules into your logging utility. Ensure it is applied to all metadata before logging.
- Integrate Middleware: Add the
securityLoggerMiddleware to your Express/Fastify application. Verify that correlation IDs are generated and attached to responses.
- Instrument Key Events: Identify critical paths (login, password reset, admin actions). Call
logSecurityEvent with appropriate risk scores and metadata.
- Verify in Dashboard: Trigger a test event (e.g., failed login). Check your log aggregation dashboard to confirm the event appears with correct structure, severity, and correlation ID. Ensure no PII is visible.