a schema definition library like Zod (TypeScript) or Pydantic (Python). These libraries provide runtime validation, type inference, and conversion to JSON Schema, which is the lingua franca for LLM structure constraints.
TypeScript Implementation:
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
// 1. Define the strict contract
const ProductExtractionSchema = z.object({
id: z.string().uuid().describe("Unique product identifier"),
name: z.string().min(1).max(100).describe("Product name"),
price: z.number().positive().describe("Price in USD"),
attributes: z.record(z.string(), z.any()).optional().describe("Key-value pairs of attributes"),
category: z.enum(['electronics', 'clothing', 'home', 'other']).describe("Product category"),
confidence: z.number().min(0).max(1).describe("Extraction confidence score"),
});
// 2. Convert to JSON Schema for the LLM API
const jsonSchema = zodToJsonSchema(ProductExtractionSchema, {
$refStrategy: 'none',
target: 'openApi',
});
// Type inference for downstream usage
type ProductExtraction = z.infer<typeof ProductExtractionSchema>;
Step 2: Select the Constraint Mechanism
Different providers offer distinct mechanisms for enforcing structure. The choice depends on the model provider and latency requirements.
- JSON Mode / Response Format: Forces the model to output valid JSON. Combined with a schema, this is the baseline for structured output.
- Function Calling / Tool Use: Embeds the schema within a tool definition. The model generates arguments matching the tool's schema. This is highly reliable but may incur slightly higher latency due to the tool-use protocol overhead.
- Grammar-Constrained Decoding: Advanced providers allow specifying a grammar (e.g., JSON Schema as a grammar) that restricts the token sampler. This guarantees structural validity at the token level, preventing invalid JSON generation entirely.
Step 3: Implementation with Validation and Retry
Even with native constraints, implement a validation layer. Models may occasionally produce outputs that pass syntax checks but fail semantic validation, or API wrappers may have edge cases. A retry loop with error feedback is essential for production resilience.
import OpenAI from 'openai';
const openai = new OpenAI();
export async function extractProduct(text: string): Promise<ProductExtraction> {
const maxRetries = 3;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: 'Extract product data from the provided text. Return strictly valid JSON matching the schema.'
},
{
role: 'user',
content: text
}
],
// Enforce JSON structure
response_format: {
type: 'json_schema',
json_schema: {
name: 'product_extraction',
schema: jsonSchema,
strict: true, // Critical: Ensures strict adherence to schema
},
},
});
const content = response.choices[0]?.message?.content;
if (!content) throw new Error('Empty response from model');
// 1. Parse JSON
const parsed = JSON.parse(content);
// 2. Validate against Zod schema (runtime safety)
const validationResult = ProductExtractionSchema.safeParse(parsed);
if (!validationResult.success) {
throw new Error(`Schema validation failed: ${validationResult.error.message}`);
}
return validationResult.data;
} catch (error) {
// 3. Retry logic with error feedback
if (attempt === maxRetries) {
throw new Error(`Extraction failed after ${maxRetries} attempts: ${(error as Error).message}`);
}
// Inject error context into next attempt for self-correction
console.warn(`Attempt ${attempt} failed: ${(error as Error).message}. Retrying...`);
// In a full implementation, append the error message to the conversation history
// or use a specialized retry wrapper.
}
}
throw new Error('Unreachable');
}
Architecture Decisions
- Strict Mode Enforcement: Always enable
strict: true (or equivalent) in API calls. This prevents the model from adding fields not defined in the schema, which can break downstream deserialization.
- Decoupled Schema Definition: Keep schemas in a shared module used by both the LLM integration and downstream services. This ensures type consistency across the entire data pipeline.
- Validation Layer: Never trust the LLM output implicitly. The
safeParse step is non-negotiable. It catches semantic violations that syntax checks miss.
- Error Feedback Loops: Implement a mechanism to pass validation errors back to the model. If a field is missing or malformed, the retry prompt should include the specific validation error, allowing the model to self-correct.
Pitfall Guide
Mistake: Defining enums with hundreds of values or overly complex regex patterns for string formats.
Impact: Models struggle to select from large enum lists, increasing hallucination rates. Complex regex constraints may be ignored or cause generation stalls.
Best Practice: Keep enums under 20 items where possible. Use descriptive strings and validate formats in the Zod schema rather than relying on the model to generate perfect regex matches.
2. Nested Object Explosion
Mistake: Creating deeply nested schemas with multiple levels of optional arrays and objects.
Impact: Token consumption spikes, and models frequently omit nested fields or misalign JSON brackets.
Best Practice: Flatten schemas where feasible. If nesting is required, use describe annotations to guide the model. Test extraction on edge cases with missing nested data.
3. Ignoring Token Limits in Schemas
Mistake: Embedding massive JSON schemas in the prompt or system message without considering context window limits.
Impact: Truncation of the schema leads to partial enforcement. The model only sees part of the structure and generates invalid output.
Best Practice: Use API-level schema passing (e.g., response_format.json_schema) rather than embedding the schema in text. This optimizes token usage and ensures the model receives the full constraint.
4. Assuming Universal Structured Support
Mistake: Writing code that assumes all models support JSON mode or function calling equally.
Impact: Failures when switching to open-weight models or older API versions.
Best Practice: Abstract the structured output mechanism behind an interface. Implement fallbacks for models lacking native support, such as grammar-constrained decoding via vLLM or TGI, or regex extraction with aggressive validation.
5. Schema Drift in Production
Mistake: Updating the Zod schema without updating the prompt instructions or test cases.
Impact: The model continues generating old structures, or new fields are ignored.
Best Practice: Integrate schema changes into CI/CD. Run automated tests that verify LLM output against the current schema definition. Use versioned schemas if backward compatibility is required.
6. Hallucination of Required Fields
Mistake: Marking fields as required in the schema when the source text may not contain that information.
Impact: The model fabricates data to satisfy the schema, leading to data integrity issues.
Best Practice: Use optional() in Zod for fields that may be absent. In the system prompt, instruct the model to use null or omit fields when data is unavailable, and configure the schema to handle nulls gracefully.
7. Lack of Monitoring for Schema Violations
Mistake: Deploying structured extraction without monitoring validation failure rates.
Impact: Silent degradation of data quality. Teams remain unaware of rising error rates until downstream systems crash.
Best Practice: Instrument the validation layer to emit metrics on schema violation types. Set alerts for violation rate thresholds. Log failed validations for analysis and prompt refinement.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-Volume, Cost-Sensitive | Native JSON Mode + Small Model | Low latency, minimal token overhead, high reliability with modern small models. | Low |
| Complex Nested Extraction | Function Calling + Large Model | Better handling of complex schemas and tool-use logic; reduces nesting errors. | Medium |
| Open-Source/Local Models | Grammar-Constrained Decoding (vLLM/TGI) | Enforces structure at the token level without provider lock-in. | Low (Infrastructure) |
| Streaming Requirements | JSON Mode + Streaming Parser | Allows incremental processing while maintaining structure; requires robust streaming JSON parser. | Low |
| Legacy Model Support | Few-Shot + Regex Fallback | Necessary when native structured output is unavailable; higher maintenance. | High |
Configuration Template
schema/product.ts
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
export const ProductSchema = z.object({
id: z.string(),
name: z.string(),
price: z.number(),
currency: z.string().default('USD'),
tags: z.array(z.string()).optional(),
});
export const ProductJsonSchema = zodToJsonSchema(ProductSchema, {
name: 'product',
target: 'openApi',
});
export type Product = z.infer<typeof ProductSchema>;
clients/llm.ts
import { ProductJsonSchema } from './schema/product';
export const llmConfig = {
model: 'gpt-4o-mini',
response_format: {
type: 'json_schema',
json_schema: {
name: 'product_extraction',
schema: ProductJsonSchema,
strict: true,
},
},
temperature: 0.1, // Low temperature for deterministic extraction
};
Quick Start Guide
- Install Dependencies:
npm install zod zod-to-json-schema openai
- Define Schema: Create a Zod schema in
schema.ts describing your target structure.
- Convert Schema: Use
zodToJsonSchema to generate the JSON Schema object.
- Configure API Call: Pass the JSON Schema to the LLM client via
response_format with strict: true.
- Validate Output: Parse the response and run
schema.safeParse() before using the data. Handle validation errors with a retry loop.
Structured output transforms LLMs from text generators into reliable data extraction engines. By enforcing contracts at the schema level and leveraging native API constraints, developers can eliminate parsing fragility, reduce latency, and build production-grade AI applications with confidence.