the OTLP exporter, deploying an LLM-aware receiver, and ensuring trace context propagation across agent boundaries.
Step 1: Validate Native Framework Emission
Confirm your framework emits gen_ai.* spans. Modern versions of Spring AI, LangChain4j, Koog, and Python/Go instrumentations output these attributes automatically. No custom instrumentation code is required.
Replace proprietary SDK initialization with standard OpenTelemetry exporter configuration. Use HTTP/protobuf for firewall compatibility or gRPC for high-throughput environments.
Java/Spring Boot Configuration
# application.properties
management.tracing.export.otlp.url=${OTEL_COLLECTOR_ENDPOINT}/v1/traces
management.tracing.export.otlp.headers.authorization=Bearer ${OTEL_AUTH_TOKEN}
management.tracing.sampling.probability=1.0
spring.ai.otel.enabled=true
Kotlin/Koog Implementation
import io.opentelemetry.api.OpenTelemetry
import io.opentelemetry.sdk.OpenTelemetrySdk
import io.opentelemetry.exporter.otlp.http.trace.OtlpHttpSpanExporter
import io.opentelemetry.sdk.trace.SdkTracerProvider
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor
fun configureAgentTelemetry(serviceName: String): OpenTelemetry {
val spanExporter = OtlpHttpSpanExporter.builder()
.setEndpoint(System.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"))
.addHeader("Authorization", "Bearer ${System.getenv("OTEL_AUTH_TOKEN")}")
.build()
val tracerProvider = SdkTracerProvider.builder()
.addSpanProcessor(BatchSpanProcessor.builder(spanExporter).build())
.setResource(Resource.create(Attributes.of(AttributeKey.stringKey("service.name"), serviceName)))
.build()
return OpenTelemetrySdk.builder()
.setTracerProvider(tracerProvider)
.buildAndRegisterGlobal()
}
Python Instrumentation Setup
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
def setup_telemetry():
provider = TracerProvider()
otlp_exporter = OTLPSpanExporter(
endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"),
headers={"Authorization": f"Bearer {os.getenv('OTEL_AUTH_TOKEN')}"}
)
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
trace.set_tracer_provider(provider)
OpenAIInstrumentor().instrument()
return trace.get_tracer(__name__)
Go OTel Integration
package main
import (
"context"
"os"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
exporter, err := otlptracehttp.New(ctx,
otlptracehttp.WithEndpoint(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")),
otlptracehttp.WithHeaders(map[string]string{
"Authorization": "Bearer " + os.Getenv("OTEL_AUTH_TOKEN"),
}),
)
if err != nil {
return nil, err
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("ai-agent-service"),
)),
)
otel.SetTracerProvider(tp)
return tp, nil
}
Step 3: Deploy LLM-Aware Receiver
Standard OTLP collectors (OpenTelemetry Collector, Jaeger, Tempo) will ingest the spans but won't interpret gen_ai.* attributes meaningfully. Deploy a receiver that understands:
gen_ai.client.chat and gen_ai.client.completion for model routing
gen_ai.tool.execute for function calling visibility
gen_ai.usage.input_tokens and gen_ai.usage.output_tokens for cost calculation
gen_ai.response.finish_reason for failure classification
Step 4: Ensure Trace Context Propagation
AI agents frequently fan out across multiple services, message queues, or async workers. Propagate traceparent headers across HTTP calls and embed tracestate segments for cross-agent identity. Use OpenTelemetry Baggage to carry user/session identifiers without polluting span attributes.
Architecture Rationale
- OTLP over Vendor SDKs: Decouples instrumentation from consumption. Frameworks own signal generation; backends own interpretation.
- HTTP/protobuf Exporter: Simplifies network configuration. Most corporate firewalls allow outbound HTTPS, whereas gRPC requires explicit port allowances.
- 100% Sampling During Development: AI loops and tool-calling failures require full context. Production environments should transition to tail-based sampling to control storage costs.
- Server-Side Cost Mapping: Token counts are framework-agnostic. Pricing tables change frequently. Compute costs at the collector level using dynamic vendor rate cards, not hardcoded in application logic.
Pitfall Guide
1. Under-Sampling AI Traces
Explanation: Applying default 10% sampling to AI workloads destroys debugging capability. A single hallucination or tool loop might occur in the unsampled 90%.
Fix: Use probability=1.0 in staging. In production, implement tail-based sampling that retains traces containing gen_ai.response.finish_reason=error or high token counts.
2. Leaking PII in Prompt Attributes
Explanation: gen_ai.prompt and gen_ai.completion attributes often contain user data, API keys, or internal documentation. Exporting them to observability backends violates GDPR/CCPA and creates compliance risk.
Fix: Implement a SpanProcessor that hashes or redacts sensitive fields before export. Use regex patterns to detect and mask emails, SSNs, or credential formats.
3. Mixing Vendor SDKs with Native OTel
Explanation: Teams often keep a legacy observability SDK installed while enabling native framework telemetry. This creates duplicate spans, inflated costs, and conflicting trace IDs.
Fix: Audit pom.xml, build.gradle, requirements.txt, and go.mod. Remove proprietary instrumentation libraries. Rely exclusively on framework-native gen_ai.* emission.
4. Breaking Trace Context Across Async Boundaries
Explanation: Agents frequently delegate work to message queues (Kafka, SQS) or async task runners. If traceparent isn't propagated, the agent workflow fragments into disconnected spans.
Fix: Inject traceparent into message headers. Use OpenTelemetry context propagation libraries to extract and inject context at producer/consumer boundaries.
5. Assuming Token Counts Equal Cost
Explanation: gen_ai.usage.input_tokens and output_tokens are raw counts. Pricing varies by model, region, and tier. Hardcoding rates in application code creates stale data and billing discrepancies.
Fix: Map tokens to costs at the collector/backend level. Maintain a versioned pricing registry that updates automatically when vendors adjust rates.
Explanation: Hardcoding bearer tokens or API keys in configuration files exposes credentials in version control and container images.
Fix: Use environment variables, secret managers (HashiCorp Vault, AWS Secrets Manager), or workload identity federation. Rotate credentials automatically.
Explanation: Frameworks emit gen_ai.tool.execute as child spans of gen_ai.client.chat. If parent-child relationships are broken, you lose visibility into which tool was called during which model decision.
Fix: Verify span hierarchy in your backend. Ensure parent_span_id is correctly set during tool invocation. Use span links for cross-agent calls instead of forcing parent-child relationships.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single framework, strict compliance | Native OTel + LLM-aware backend | Eliminates SDK debt, satisfies audit requirements | Low (infrastructure only) |
| Multi-framework polyglot stack | Standard OTLP + unified collector | Prevents SDK sprawl, normalizes telemetry across languages | Medium (collector scaling) |
| High-volume production (>10k req/min) | Tail-based sampling + gRPC exporter | Reduces storage costs while preserving error context | Low (savings on ingestion) |
| On-prem / air-gapped deployment | Self-hosted OTel Collector + local backend | Maintains data sovereignty, avoids cloud egress fees | High (infrastructure ownership) |
Configuration Template
# otel-collector-config.yaml
receivers:
otlp:
protocols:
http:
endpoint: "0.0.0.0:4318"
processors:
batch:
timeout: 5s
send_batch_max_size: 1000
redact:
patterns:
- "gen_ai.prompt": "REDACTED"
- "gen_ai.completion": "REDACTED"
attributes:
- "user.email"
- "payment.card_number"
exporters:
otlp/llm-backend:
endpoint: "${LLM_OBSERVABILITY_ENDPOINT}"
headers:
authorization: "Bearer ${LLM_OBSERVABILITY_TOKEN}"
tls:
insecure: false
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, redact]
exporters: [otlp/llm-backend]
Quick Start Guide
- Upgrade Frameworks: Ensure your AI framework is on a version that natively emits
gen_ai.* spans (Spring AI 1.0+, LangChain4j 0.35+, Koog 0.8+, OpenLLMetry 0.3+).
- Set Environment Variables: Export
OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_AUTH_TOKEN to your runtime environment.
- Initialize Exporter: Add standard OTLP exporter configuration to your application. Remove any proprietary observability SDKs.
- Validate Spans: Run a test agent workflow. Query your LLM-aware backend for
gen_ai.client.chat spans. Verify token counts, tool calls, and finish reasons are present.
- Enable Sampling Policy: Configure tail-based sampling rules to retain error traces and high-cost requests. Deploy to production.