p 1: Enforce the Provider/Domain Boundary
The most critical structural decision is separating provider mechanics from domain intent. The provider layer understands HTTP, authentication, and SDK-specific types. The domain layer understands business rules, content styles, and prompt construction. These concerns must never share a class.
public interface IModelGateway
{
Task<ModelExecutionResult<string>> ExecuteAsync(
string systemInstruction,
string userInput,
CancellationToken ct);
}
public interface IProductCopyEngine
{
Task<ModelExecutionResult<CopyDraft>> GenerateAsync(
ProductContext context,
ContentStyle style,
CancellationToken ct);
}
IModelGateway accepts raw strings and returns a raw string. It knows nothing about products, pricing, or marketing. IProductCopyEngine constructs prompts, applies business constraints, and maps the raw output into a domain object. This separation enables unit testing the domain layer with a mock gateway, swapping providers without touching business logic, and isolating SDK upgrades to a single implementation.
Step 2: Implement the Unified Result Envelope
Raw SDK responses force every caller to handle network failures, rate limits, and deserialization errors. A unified envelope centralizes error handling, caching metadata, and token accounting.
public sealed class ModelExecutionResult<T>
{
public bool IsSuccessful { get; init; }
public T? Payload { get; init; }
public string? DiagnosticMessage { get; init; }
public bool WasServedFromCache { get; init; }
public int ConsumedTokens { get; init; }
public static ModelExecutionResult<T> Success(T data, int tokens = 0, bool cached = false) =>
new() { IsSuccessful = true, Payload = data, ConsumedTokens = tokens, WasServedFromCache = cached };
public static ModelExecutionResult<T> Failure(string reason) =>
new() { IsSuccessful = false, DiagnosticMessage = reason };
}
Static factory methods guarantee invalid states cannot be constructed. Callers above the provider layer never write try/catch. They inspect IsSuccessful and branch accordingly. The same envelope scales from text generation to structured JSON extraction, chat history management, or vector search results.
Step 3: Centralize Prompt Contracts
Prompt engineering belongs in a single, auditable location. Scattering instructions across controllers or inline HTTP calls creates maintenance debt and security risks. System instructions and user input must be constructed separately to prevent prompt injection and enable deterministic testing.
internal static class InstructionTemplateFactory
{
internal static string Resolve(ContentStyle style) => style switch
{
ContentStyle.Authoritative =>
"Act as a senior technical copywriter. Produce precise, fact-bound descriptions. " +
"Do not invent specifications, awards, or third-party claims.",
ContentStyle.Conversational =>
"Adopt a helpful, approachable tone. Focus on user benefits and clarity.",
_ => throw new ArgumentOutOfRangeException(nameof(style), style, null)
};
}
Temperature, system instructions, and token limits serve three distinct purposes. Temperature controls output variance. System instructions enforce behavioral constraints. Token limits enforce budget boundaries. Confusing these controls leads to unpredictable outputs and unbounded costs.
Step 4: Wire Cancellation and Caching
Asynchronous AI calls must propagate CancellationToken through every layer. A token accepted at the controller but not forwarded to the SDK call creates a silent cost leak. The client disconnects, but the backend continues consuming tokens and billing the account.
Caching operates at the provider layer using deterministic key generation. Identical instruction and input pairs return cached results, reducing latency and cost.
private static string GenerateCacheKey(string instruction, string input) =>
$"llm:cache:{instruction.GetHashCode()}:{input.GetHashCode()}";
The WasServedFromCache flag travels through the result envelope to the presentation layer, enabling UI indicators that verify caching behavior without external telemetry. Caching should be controlled via feature flags, not hardcoded booleans, allowing runtime toggling without redeployment.
Step 5: Register and Validate at Startup
All AI infrastructure belongs at the application root, not scoped to UI areas or feature folders. Cross-cutting capabilities must be registered through a single extension method to prevent duplicate service instances and lifetime mismatches.
public static class AiInfrastructureExtensions
{
public static IServiceCollection AddModelServices(this IServiceCollection services, IConfiguration config)
{
var settings = config.GetSection("ModelProvider").Get<ModelSettings>()
?? throw new InvalidOperationException("Model configuration is missing or malformed.");
services.AddSingleton(settings);
services.AddScoped<IModelGateway, AzureOpenAiGateway>();
services.AddScoped<IProductCopyEngine, ProductCopyEngine>();
return services;
}
}
Configuration validation must occur at startup. A missing environment variable or malformed JSON should fail fast with a clear diagnostic, not produce a zeroed-out settings object that crashes three layers downstream during request execution.
Pitfall Guide
1. Blurring Provider and Domain Logic
Explanation: Combining SDK calls, prompt construction, and business mapping into a single class creates tight coupling. Swapping providers or adjusting business rules requires rewriting the entire service.
Fix: Enforce a strict seam. The provider handles transport and SDK types. The domain handles prompt assembly, validation, and result mapping. Test the domain with a mock provider.
2. Silent Configuration Deserialization
Explanation: ASP.NET Core's configuration binder silently ignores missing keys or type mismatches, producing default values. This leads to zeroed-out token limits, null API keys, or malformed endpoint URLs that only surface under load.
Fix: Validate configuration immediately after binding. Use ?? throw or explicit null checks. Log startup diagnostics. Fail fast before the first request arrives.
3. Unbounded Async Chains
Explanation: Accepting CancellationToken at the controller but not forwarding it to the SDK call creates cost leakage. Disconnected clients still trigger full API executions.
Fix: Propagate the token through every await. Verify the chain: Controller β Domain Service β Provider β SDK. Use using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, sdkCts); when wrapping external calls.
4. Inheritance-Based Result Models
Explanation: Deriving domain objects from infrastructure wrappers drags metadata like Success, ErrorMessage, and TokensUsed into persistence layers, DTOs, and service boundaries. These concerns belong to the execution envelope, not the domain payload.
Fix: Use composition. The domain object contains only business data. The wrapper contains execution metadata. Return ModelExecutionResult<DomainObject>, not DomainObject : ModelExecutionResult.
5. Treating Temperature as a Safety Mechanism
Explanation: Developers often lower temperature to prevent hallucinations or enforce constraints. Temperature only reduces output variance; it does not enforce factual accuracy or behavioral rules.
Fix: Use system instructions for constraints, temperature for style variance, and token limits for budget control. Audit prompts for explicit negative constraints rather than relying on sampling parameters.
6. Scoping Cross-Cutting AI Services to UI Areas
Explanation: Placing AI services inside MVC Areas or feature folders implies UI-driven boundaries. AI capabilities span search, recommendations, support, and content generation. UI grouping creates artificial coupling and complicates future feature expansion.
Fix: Register AI infrastructure at the application root. Use dependency injection to expose capabilities to any layer. Keep UI routing separate from service architecture.
7. Late-Stage Config Validation
Explanation: Validating API keys, endpoint URLs, or model deployments only during the first request masks configuration errors until production traffic hits. This causes cascade failures and poor developer experience.
Fix: Validate during Program.cs execution. Check endpoint reachability, key format, and model availability. Fail startup with actionable diagnostics. Use health checks to verify runtime connectivity.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Early prototyping | gpt-4o-mini with aggressive caching | Fast iteration, predictable costs, sufficient quality for tone/style tasks | ~$0.15 per 1k tokens; caching reduces effective cost by 60-80% |
| High-stakes factual generation | gpt-4o with strict system constraints | Higher reasoning accuracy, better instruction following, reduced hallucination | ~$2.50 per 1k tokens; requires tighter token budgeting |
| Internal tooling / low traffic | On-premise or open-weight model via local gateway | Zero API cost, full data control, predictable latency | Infrastructure overhead; requires GPU/CPU provisioning |
| Multi-turn conversational features | Persistent ChatHistory with session-scoped storage | Maintains context, reduces redundant prompt injection, improves UX | Context window costs scale with history length; implement truncation policies |
Configuration Template
{
"ModelProvider": {
"Endpoint": "https://your-resource.openai.azure.com/",
"DeploymentName": "gpt-4o-mini",
"ApiKey": "",
"MaxTokens": 1024,
"Temperature": 0.7,
"EnableCaching": true,
"CacheExpirationMinutes": 60,
"CostTrackingEnabled": true
}
}
Note: ApiKey must be supplied via User Secrets locally and Application Settings in production. Never commit secrets to source control or configuration files.
Quick Start Guide
- Install dependencies: Add
Azure.AI.OpenAI and Microsoft.Extensions.Caching.Memory to your project.
- Create the gateway: Implement
IModelGateway using OpenAIClient with ChatCompletions API. Wire CancellationToken and cache lookup.
- Register services: Call
builder.Services.AddModelServices(builder.Configuration) in Program.cs. Verify startup validation passes.
- Build the domain engine: Implement
IProductCopyEngine to assemble prompts, call the gateway, and map results to domain objects.
- Wire the endpoint: Create a controller action that accepts content parameters, passes
HttpContext.RequestAborted as the cancellation token, and returns ModelExecutionResult<T> to the client.
This architecture transforms AI from an experimental API consumer into a production-grade subsystem. By enforcing boundaries, centralizing contracts, and validating early, you gain testability, cost visibility, and deployment resilience. The model choice matters less than the structural discipline surrounding it.