e Output Contract
Use a schema library that supports runtime validation, type inference, and custom refinement. Zod is preferred for its TypeScript-native design and explicit error mapping.
import { z } from "zod";
export const AnalysisOutputSchema = z.object({
summary: z.string().min(10).max(500),
confidence: z.number().min(0).max(1).describe("0-1 confidence score"),
tags: z.array(z.enum(["urgent", "routine", "informational"])).min(1),
metadata: z.object({
source: z.string(),
timestamp: z.string().datetime(),
version: z.literal("v2")
})
});
export type AnalysisOutput = z.infer<typeof AnalysisOutputSchema>;
Step 2: Implement Defensive Parsing
LLMs frequently wrap JSON in markdown code blocks or append trailing text. A robust parser extracts the first valid JSON object before validation.
import { ZodError } from "zod";
export function extractJson(raw: string): string {
const jsonMatch = raw.match(/```(?:json)?\s*([\s\S]*?)\s*```/);
const candidate = jsonMatch ? jsonMatch[1] : raw;
// Fallback: locate first { and last }
const firstBrace = candidate.indexOf("{");
const lastBrace = candidate.lastIndexOf("}");
if (firstBrace === -1 || lastBrace === -1) {
throw new Error("No JSON structure detected in LLM output");
}
return candidate.slice(firstBrace, lastBrace + 1);
}
Step 3: Build the Validation Pipeline
Chain parsing, structural validation, and business rules. Separate semantic checks to avoid blocking latency-critical paths.
export class LLMOutputValidator<T> {
constructor(
private schema: z.ZodType<T>,
private semanticRules: Array<(data: T) => Promise<string | null>> = []
) {}
async validate(rawOutput: string): Promise<T> {
const parsed = JSON.parse(extractJson(rawOutput));
const structResult = this.schema.safeParse(parsed);
if (!structResult.success) {
throw new ZodError(structResult.error.issues);
}
const data = structResult.data;
// Run semantic/business validations in parallel
const semanticErrors = await Promise.all(
this.semanticRules.map(rule => rule(data))
);
const failures = semanticErrors.filter(Boolean) as string[];
if (failures.length > 0) {
throw new Error(`Semantic validation failed: ${failures.join("; ")}`);
}
return data;
}
}
Step 4: Implement Retry & Fallback Routing
Validation failures should trigger controlled retries with backoff, not silent degradation. Circuit breakers prevent retry storms.
import { setTimeout } from "timers/promises";
export async function validateWithRetry<T>(
validator: LLMOutputValidator<T>,
generateOutput: () => Promise<string>,
maxRetries = 3
): Promise<T> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const raw = await generateOutput();
return await validator.validate(raw);
} catch (err) {
if (attempt === maxRetries) throw err;
const delay = Math.min(1000 * Math.pow(2, attempt), 5000);
await setTimeout(delay);
}
}
throw new Error("Unreachable");
}
Architecture Rationale
- Separation of concerns: Parsing handles noise, schema enforces structure, semantic rules enforce business logic. This prevents coupling format drift with domain validation.
- Streaming compatibility: The pipeline can be adapted for SSE by accumulating tokens until a complete JSON object is detectable, then validating incrementally.
- Cost-aware validation: Semantic checks run only after structural validation passes, avoiding expensive LLM-as-judge calls on malformed output.
- Observable failure modes:
ZodError provides field-level diagnostics. Semantic failures are explicitly typed. This enables precise alerting and model fine-tuning feedback loops.
Pitfall Guide
Prompts like Return strictly valid JSON reduce but do not eliminate format drift. Models operate on token probabilities, not syntax parsers. Always extract and parse defensively. Relying on prompt compliance alone guarantees production failures under load or temperature variation.
2. Validating Only After Full Stream Completion
Streaming responses introduce partial JSON states. Blocking validation until the stream closes increases latency and delays failure detection. Implement incremental JSON boundary detection or use libraries that parse streaming tokens into valid objects. Validate on complete object boundaries, not token counts.
3. Overusing LLM-as-Judge for Structural Validation
Using an LLM to validate JSON structure or enum values is computationally wasteful and introduces recursive hallucination risks. Reserve LLM-as-judge for semantic alignment, tone, or factual cross-referencing. Structural validation must be deterministic and schema-enforced.
4. Ignoring Token Budget and Validation Cost
Running multiple validation passes, especially semantic checks, compounds token spend. Cache validation results for identical inputs, batch semantic evaluations, and implement early-exit logic. Track validation token overhead separately from generation tokens to maintain accurate cost attribution.
5. Missing Idempotent Retry Strategies
Retrying validation failures without request idempotency causes duplicate side effects (e.g., database writes, notification sends). Ensure retries target only the generation step, not downstream consumers. Use request IDs, idempotency keys, and outbox patterns to decouple validation from business operations.
6. Hardcoding Business Rules Inside Prompts
Embedding complex constraints in prompts (If confidence < 0.6, set tags to ["urgent"]) forces the model to execute conditional logic it is not optimized for. Move business rules to TypeScript validation functions. Prompts should guide style and scope; code should enforce constraints.
7. No Fallback for Validation Failures
Throwing unhandled errors on validation failure crashes user experiences. Implement graceful degradation: return a structured error object, trigger a fallback model, or queue for human review. Always expose validation status in response headers or telemetry for observability.
Best Practices from Production:
- Treat LLM output as untrusted input. Apply the same validation rigor as external API payloads.
- Version your schemas. Model updates change output distributions; schema versioning prevents silent breakage.
- Log validation failure signatures, not full outputs. Enable pattern detection without exposing sensitive data.
- Benchmark validation latency separately. Optimize parsing and schema compilation at startup, not per-request.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-throughput public API | Schema-only + streaming parser | Minimal latency, deterministic enforcement, scales horizontally | Low (+5-15ms, near-zero token cost) |
| Critical data pipeline | Schema + semantic rules + LLM-as-judge | Requires factual cross-validation and domain constraint enforcement | Medium (+70-100ms, +10-15% token spend) |
| Interactive chat/UX | Schema + fallback routing + partial validation | Prioritizes responsiveness; validates on complete utterances | Low (+20-30ms, user-perceived latency acceptable) |
| Research/prototyping | Prompt-only + manual spot checks | Speed of iteration outweighs production safety requirements | Negligible (high operational risk, low immediate cost) |
Configuration Template
// validator.config.ts
import { z } from "zod";
import { LLMOutputValidator } from "./llm-validator";
export const OutputContract = z.object({
id: z.string().uuid(),
status: z.enum(["pending", "approved", "rejected"]),
score: z.number().min(0).max(100),
reasoning: z.string().min(20).max(1000)
});
export const validator = new LLMOutputValidator(OutputContract, [
async (data) => {
if (data.status === "rejected" && data.score > 70) {
return "High score conflicts with rejection status";
}
return null;
}
]);
export const retryConfig = {
maxAttempts: 3,
baseDelayMs: 800,
maxDelayMs: 4000,
circuitBreakerThreshold: 5 // failures before opening circuit
};
Quick Start Guide
- Install dependencies:
npm install zod
- Define your schema: Create a Zod object matching your expected LLM output structure with type, range, and enum constraints.
- Wrap your generation call: Pass your LLM response through
extractJson(), then call validator.validate(rawOutput). Integrate validateWithRetry() for production resilience.
- Run a validation test: Feed malformed, partial, and semantically invalid outputs to verify error mapping. Log failure signatures and monitor latency overhead in your observability dashboard.