Difficulty

Intermediate

Read Time

8 min

Your AI agent already emits OpenTelemetry. Why aren't you watching it?

By Codcompass Team·2026-05-09·8 min read

Standardizing AI Agent Observability: From Vendor Lock-in to OpenTelemetry `gen_ai.*` Conventions

Current Situation Analysis

Generative AI agents operate on non-deterministic execution paths. Unlike traditional microservices that follow predictable request-response cycles, agents dynamically select models, construct prompts, invoke external tools, and retry on failure. Traditional observability stacks were never designed to capture this cognitive workflow.

When teams deploy LLM agents into production, they quickly encounter a visibility gap. Generic APM platforms track HTTP latency, error rates, and throughput, but they treat the AI layer as a black box. A POST /v1/chat span reveals nothing about which model was selected, how many tokens were consumed, which tools were invoked, or why the planner chose a specific action. The signal is either buried in raw request payloads or discarded entirely.

To bridge this gap, engineering teams historically reached for proprietary observability SDKs. These tools capture the right telemetry but introduce severe architectural debt. They couple your application to a specific vendor, require coordinated upgrades alongside framework releases, and multiply dependency footprints when teams run polyglot stacks. A single organization might use Spring AI for orchestration, LangChain4j for retrieval, and a Python framework for data preprocessing. Each vendor SDK demands its own initialization, configuration, and lifecycle management.

The industry is now shifting toward a standardized approach. The OpenTelemetry community finalized the gen_ai.* semantic conventions, and major AI frameworks have adopted them natively. Spring AI 1.0 emits telemetry via Micrometer Observations. LangChain4j exposes the same signals through its ChatModelListener API. Koog 0.8 includes a first-class OpenTelemetry feature for the JVM. Python's OpenLLMetry and OpenInference projects provide instrumentations for Anthropic, OpenAI, LangChain, and LlamaIndex. Go's otel-instrumentation-genai package follows the same pattern.

The telemetry is already on the wire in standard form. The bottleneck is no longer instrumentation; it's reception. Teams need an OTLP endpoint that understands gen_ai.* attributes, reconstructs agent workflows, and surfaces actionable insights without requiring application-level vendor dependencies.

WOW Moment: Key Findings

The transition from proprietary SDKs to standard OpenTelemetry conventions fundamentally changes how AI observability is architected. The table below compares the three dominant approaches currently in production.

Approach	Signal Coverage	Framework Coupling	Implementation Effort	Backend Portability
Generic APM	~15% (HTTP/infra only)	None	Low	High
Proprietary Vendor SDK	~90% (LLM-specific)	High	High	Low
Standard OTel + LLM-Aware Backend	~95% (Full semantic depth)	Zero	Low	High

This finding matters because it decouples telemetry generation from telemetry consumption. Frameworks handle signal emission through native contracts. The collector or backend handles interpretation, cost mapping, graph reconstruction, and policy enforcement. Engineering teams can swap backends, upgrade frameworks, or migrate cloud providers without touching application code. The observability layer becomes infrastructure, not application logic.

Core Solution

Implementing standardized AI agent observability requires four architectural steps: validating native emission, configuring

the OTLP exporter, deploying an LLM-aware receiver, and ensuring trace context propagation across agent boundaries.

Step 1: Validate Native Framework Emission

Confirm your framework emits gen_ai.* spans. Modern versions of Spring AI, LangChain4j, Koog, and Python/Go instrumentations output these attributes automatically. No custom instrumentation code is required.

Step 2: Configure Standard OTLP Exporter

Replace proprietary SDK initialization with standard OpenTelemetry exporter configuration. Use HTTP/protobuf for firewall compatibility or gRPC for high-throughput environments.

Java/Spring Boot Configuration

# application.properties
management.tracing.export.otlp.url=${OTEL_COLLECTOR_ENDPOINT}/v1/traces
management.tracing.export.otlp.headers.authorization=Bearer ${OTEL_AUTH_TOKEN}
management.tracing.sampling.probability=1.0
spring.ai.otel.enabled=true

Kotlin/Koog Implementation

import io.opentelemetry.api.OpenTelemetry
import io.opentelemetry.sdk.OpenTelemetrySdk
import io.opentelemetry.exporter.otlp.http.trace.OtlpHttpSpanExporter
import io.opentelemetry.sdk.trace.SdkTracerProvider
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor

fun configureAgentTelemetry(serviceName: String): OpenTelemetry {
    val spanExporter = OtlpHttpSpanExporter.builder()
        .setEndpoint(System.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"))
        .addHeader("Authorization", "Bearer ${System.getenv("OTEL_AUTH_TOKEN")}")
        .build()

    val tracerProvider = SdkTracerProvider.builder()
        .addSpanProcessor(BatchSpanProcessor.builder(spanExporter).build())
        .setResource(Resource.create(Attributes.of(AttributeKey.stringKey("service.name"), serviceName)))
        .build()

    return OpenTelemetrySdk.builder()
        .setTracerProvider(tracerProvider)
        .buildAndRegisterGlobal()
}

Python Instrumentation Setup

import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.openai import OpenAIInstrumentor

def setup_telemetry():
    provider = TracerProvider()
    otlp_exporter = OTLPSpanExporter(
        endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"),
        headers={"Authorization": f"Bearer {os.getenv('OTEL_AUTH_TOKEN')}"}
    )
    provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
    trace.set_tracer_provider(provider)
    
    OpenAIInstrumentor().instrument()
    return trace.get_tracer(__name__)

Go OTel Integration

package main

import (
    "context"
    "os"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
    exporter, err := otlptracehttp.New(ctx,
        otlptracehttp.WithEndpoint(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")),
        otlptracehttp.WithHeaders(map[string]string{
            "Authorization": "Bearer " + os.Getenv("OTEL_AUTH_TOKEN"),
        }),
    )
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("ai-agent-service"),
        )),
    )
    otel.SetTracerProvider(tp)
    return tp, nil
}

Step 3: Deploy LLM-Aware Receiver

Standard OTLP collectors (OpenTelemetry Collector, Jaeger, Tempo) will ingest the spans but won't interpret gen_ai.* attributes meaningfully. Deploy a receiver that understands:

gen_ai.client.chat and gen_ai.client.completion for model routing
gen_ai.tool.execute for function calling visibility
gen_ai.usage.input_tokens and gen_ai.usage.output_tokens for cost calculation
gen_ai.response.finish_reason for failure classification

Step 4: Ensure Trace Context Propagation

AI agents frequently fan out across multiple services, message queues, or async workers. Propagate traceparent headers across HTTP calls and embed tracestate segments for cross-agent identity. Use OpenTelemetry Baggage to carry user/session identifiers without polluting span attributes.

Architecture Rationale

OTLP over Vendor SDKs: Decouples instrumentation from consumption. Frameworks own signal generation; backends own interpretation.
HTTP/protobuf Exporter: Simplifies network configuration. Most corporate firewalls allow outbound HTTPS, whereas gRPC requires explicit port allowances.
100% Sampling During Development: AI loops and tool-calling failures require full context. Production environments should transition to tail-based sampling to control storage costs.
Server-Side Cost Mapping: Token counts are framework-agnostic. Pricing tables change frequently. Compute costs at the collector level using dynamic vendor rate cards, not hardcoded in application logic.

Pitfall Guide

1. Under-Sampling AI Traces

Explanation: Applying default 10% sampling to AI workloads destroys debugging capability. A single hallucination or tool loop might occur in the unsampled 90%. Fix: Use probability=1.0 in staging. In production, implement tail-based sampling that retains traces containing gen_ai.response.finish_reason=error or high token counts.

2. Leaking PII in Prompt Attributes

Explanation: gen_ai.prompt and gen_ai.completion attributes often contain user data, API keys, or internal documentation. Exporting them to observability backends violates GDPR/CCPA and creates compliance risk. Fix: Implement a SpanProcessor that hashes or redacts sensitive fields before export. Use regex patterns to detect and mask emails, SSNs, or credential formats.

3. Mixing Vendor SDKs with Native OTel

Explanation: Teams often keep a legacy observability SDK installed while enabling native framework telemetry. This creates duplicate spans, inflated costs, and conflicting trace IDs. Fix: Audit pom.xml, build.gradle, requirements.txt, and go.mod. Remove proprietary instrumentation libraries. Rely exclusively on framework-native gen_ai.* emission.

4. Breaking Trace Context Across Async Boundaries

Explanation: Agents frequently delegate work to message queues (Kafka, SQS) or async task runners. If traceparent isn't propagated, the agent workflow fragments into disconnected spans. Fix: Inject traceparent into message headers. Use OpenTelemetry context propagation libraries to extract and inject context at producer/consumer boundaries.

5. Assuming Token Counts Equal Cost

Explanation: gen_ai.usage.input_tokens and output_tokens are raw counts. Pricing varies by model, region, and tier. Hardcoding rates in application code creates stale data and billing discrepancies. Fix: Map tokens to costs at the collector/backend level. Maintain a versioned pricing registry that updates automatically when vendors adjust rates.

6. Misconfigured OTLP Authentication

Explanation: Hardcoding bearer tokens or API keys in configuration files exposes credentials in version control and container images. Fix: Use environment variables, secret managers (HashiCorp Vault, AWS Secrets Manager), or workload identity federation. Rotate credentials automatically.

7. Ignoring Tool-Call Span Hierarchy

Explanation: Frameworks emit gen_ai.tool.execute as child spans of gen_ai.client.chat. If parent-child relationships are broken, you lose visibility into which tool was called during which model decision. Fix: Verify span hierarchy in your backend. Ensure parent_span_id is correctly set during tool invocation. Use span links for cross-agent calls instead of forcing parent-child relationships.

Production Bundle

Action Checklist

Verify framework version supports native gen_ai.* OTel emission
Remove proprietary observability SDKs from dependency manifests
Configure standard OTLP exporter with HTTP/protobuf endpoint
Implement PII redaction processor before span export
Propagate traceparent across all async and HTTP boundaries
Deploy LLM-aware collector/backend with token-to-cost mapping
Enable tail-based sampling for production environments
Validate span hierarchy and attribute completeness in staging

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Single framework, strict compliance	Native OTel + LLM-aware backend	Eliminates SDK debt, satisfies audit requirements	Low (infrastructure only)
Multi-framework polyglot stack	Standard OTLP + unified collector	Prevents SDK sprawl, normalizes telemetry across languages	Medium (collector scaling)
High-volume production (>10k req/min)	Tail-based sampling + gRPC exporter	Reduces storage costs while preserving error context	Low (savings on ingestion)
On-prem / air-gapped deployment	Self-hosted OTel Collector + local backend	Maintains data sovereignty, avoids cloud egress fees	High (infrastructure ownership)

Configuration Template

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"

processors:
  batch:
    timeout: 5s
    send_batch_max_size: 1000
  redact:
    patterns:
      - "gen_ai.prompt": "REDACTED"
      - "gen_ai.completion": "REDACTED"
    attributes:
      - "user.email"
      - "payment.card_number"

exporters:
  otlp/llm-backend:
    endpoint: "${LLM_OBSERVABILITY_ENDPOINT}"
    headers:
      authorization: "Bearer ${LLM_OBSERVABILITY_TOKEN}"
    tls:
      insecure: false

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, redact]
      exporters: [otlp/llm-backend]

Quick Start Guide

Upgrade Frameworks: Ensure your AI framework is on a version that natively emits gen_ai.* spans (Spring AI 1.0+, LangChain4j 0.35+, Koog 0.8+, OpenLLMetry 0.3+).
Set Environment Variables: Export OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_AUTH_TOKEN to your runtime environment.
Initialize Exporter: Add standard OTLP exporter configuration to your application. Remove any proprietary observability SDKs.
Validate Spans: Run a test agent workflow. Query your LLM-aware backend for gen_ai.client.chat spans. Verify token counts, tool calls, and finish reasons are present.
Enable Sampling Policy: Configure tail-based sampling rules to retain error traces and high-cost requests. Deploy to production.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr