erProvider with OTLP Exporter
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-proto';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: 'payment-processor',
[SEMRESATTRS_SERVICE_VERSION]: '1.4.2',
}),
traceExporter: new OTLPTraceExporter({
url: 'http://otel-collector:4318/v1/traces',
headers: { 'x-api-key': process.env.OTEL_EXPORTER_API_KEY || '' },
}),
instrumentations: [
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-http': {
ignoreIncomingRequestHook: (req) => req.url?.includes('/health'),
},
'@opentelemetry/instrumentation-fs': { enabled: false },
}),
],
sampler: new ParentBasedSampler({ root: new TraceIdRatioBased(0.1) }),
});
sdk.start();
Step 3: Context Propagation & Manual Span Creation
Automatic instrumentation captures HTTP/gRPC and database calls. Business logic requires explicit spans to preserve semantic meaning.
import { trace, context, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('payment-processor');
export async function processOrder(orderId: string) {
return tracer.startActiveSpan('processOrder', async (span) => {
try {
span.setAttributes({ 'app.order.id': orderId });
const inventory = await fetchInventory(orderId); // Auto-instrumented HTTP
const payment = await chargePayment(inventory.total); // Auto-instrumented gRPC
span.setStatus({ code: SpanStatusCode.OK });
return { status: 'completed', transactionId: payment.id };
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
throw err;
} finally {
span.end();
}
});
}
Step 4: Async Boundary Handling
JavaScript's event loop breaks implicit context. Use context.with() or bind() to preserve trace context across promises, timers, and worker threads.
import { context } from '@opentelemetry/api';
async function asyncTask() {
const activeCtx = context.active();
setTimeout(() => {
context.with(activeCtx, () => {
// Trace context preserved across async boundary
tracer.startActiveSpan('async-cleanup', (span) => {
// work
span.end();
});
});
}, 2000);
}
Architecture Decisions & Rationale
- Hybrid Instrumentation: Auto-instrumentation covers 80% of I/O with zero boilerplate. Manual spans enforce domain semantics, preventing generic
http.request spans from drowning business logic.
- OTLP over HTTP/gRPC: OTLP is the CNCF standard. HTTP/protobuf offers easier load balancer compatibility; gRPC provides higher throughput. Choose based on collector topology.
- Parent-Based Sampling: TraceIdRatioBased at 0.1 reduces storage by 90% while preserving error traces. ParentBasedSampler ensures child spans inherit the parent's sampling decision, preventing fragmented traces.
- Semantic Conventions: Attributes like
http.method, db.statement, and error.type follow OTel specs. Custom attributes should be namespaced (app.*, biz.*) to avoid collisions.
Pitfall Guide
1. Ignoring Sampling Strategies
Problem: Exporting 100% of traces in high-throughput services inflates storage costs and degrades collector performance.
Best Practice: Implement head-based sampling for cost control. Use TraceIdRatioBased for uniform distribution. If error visibility is critical, pair with tail-based sampling at the collector level to guarantee 100% of error traces are retained regardless of initial sampling.
2. Breaking Context Propagation Across Async Boundaries
Problem: Unhandled promises, setTimeout, or worker threads lose the active context, creating orphaned spans and broken trace graphs.
Best Practice: Always bind async callbacks to the active context using context.with() or context.bind(). Use AsyncLocalStorage (Node.js 16+) with OTel's contextManager: new AsyncLocalStorageContextManager() to automate propagation.
3. Over-Instrumenting Every Function
Problem: Creating spans for every method call generates noise, increases latency by 5β15%, and obscures meaningful bottlenecks.
Best Practice: Instrument only I/O boundaries, external calls, and critical business transactions. Use span attributes instead of child spans for lightweight metadata. Reserve nested spans for logical grouping, not execution steps.
4. Treating Trace IDs as Correlation IDs
Problem: Trace IDs are randomly generated for observability. Business correlation IDs (order IDs, tenant IDs) require deterministic tracking across systems.
Best Practice: Inject correlation IDs into span attributes (app.correlation.id) and propagate them alongside trace context. Use baggage for cross-service business metadata, but respect HTTP header size limits (typically 8KB).
5. Exporting Raw Traces Without Semantic Conventions
Problem: Custom attributes with inconsistent naming break dashboard queries, alerting rules, and downstream analytics.
Best Practice: Adopt OTel semantic conventions for HTTP, database, and messaging spans. Validate attributes against the OTel spec before deployment. Use a collector processor (attributes or resource) to normalize missing fields.
6. Neglecting Baggage Size Limits
Problem: Baggage propagates key-value pairs across services. Unbounded baggage exceeds header limits, causing HTTP 431 or silent drops.
Best Practice: Limit baggage to 5β7 critical fields. Use compression or reference IDs instead of embedding payloads. Monitor otel.baggage.size metrics to detect overflow.
7. Assuming "Set and Forget"
Problem: Trace data degrades without active governance. Spans accumulate stale attributes, sampling ratios drift, and collector backpressure goes unnoticed.
Best Practice: Implement span attribute validation in CI. Monitor collector health metrics (otelcol_exporter_sent_spans, otelcol_receiver_refused_spans). Review trace graphs weekly to prune low-value spans and enforce semantic standards.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Startup / MVP | Auto-instrumentation + OTLP to Jaeger | Fastest path to visibility with minimal config | Low ($0β$200/mo self-hosted) |
| High-Throughput SaaS | Hybrid instrumentation + Tail-based sampling | Guarantees error trace retention while capping baseline volume | Medium ($300β$800/mo optimized storage) |
| Regulated / Compliance | Full manual spans + PII stripping processor | Audit-ready trace graphs with automated sensitive data redaction | High ($500β$1.2k/mo + compliance overhead) |
| Polyglot Microservices | OTel Collector sidecar + protocol translation | Normalizes Go, Python, Java, and Node traces into unified backend | Medium ($200β$600/mo collector infra) |
Configuration Template
OpenTelemetry Collector (otel-collector-config.yaml)
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 5s
send_batch_max_size: 1000
attributes:
actions:
- key: app.environment
value: production
action: upsert
- key: http.headers
action: delete
exporters:
otlp/jaeger:
endpoint: jaeger:14250
tls:
insecure: true
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, attributes]
exporters: [otlp/jaeger, logging]
Node.js SDK Initialization (otel-setup.ts)
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-proto';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
import { AsyncLocalStorageContextManager } from '@opentelemetry/context-async-hooks';
export function initOpenTelemetry() {
const sdk = new NodeSDK({
contextManager: new AsyncLocalStorageContextManager(),
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: process.env.SERVICE_NAME || 'backend-api',
[SEMRESATTRS_SERVICE_VERSION]: process.env.SERVICE_VERSION || '0.0.0',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
headers: process.env.OTEL_EXPORTER_HEADERS ? JSON.parse(process.env.OTEL_EXPORTER_HEADERS) : {},
}),
instrumentations: [
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-http': { enabled: true },
'@opentelemetry/instrumentation-express': { enabled: true },
'@opentelemetry/instrumentation-pg': { enabled: true },
'@opentelemetry/instrumentation-redis': { enabled: true },
}),
],
sampler: new ParentBasedSampler({
root: new TraceIdRatioBasedSampler(parseFloat(process.env.OTEL_TRACES_SAMPLER || '0.1')),
}),
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown().catch(console.error));
return sdk;
}
Quick Start Guide
- Install SDK and auto-instrumentation packages:
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-proto
- Create
otel-setup.ts with the configuration template above and import it at the entry point of your application before any route or database initialization.
- Run an OTel Collector locally using Docker:
docker run -p 4318:4318 -p 4317:4317 -v ./otel-collector-config.yaml:/etc/otel-collector-config.yaml otel/opentelemetry-collector:latest --config /etc/otel-collector-config.yaml
- Start your application and verify traces appear in Jaeger/Tempo by querying
service.name="your-service" and inspecting span hierarchy and attributes.