s within the execution loop.
- Resilience Integration: Integrate Polly or the
Microsoft.Extensions.Resilience library. Background services must handle transient failures without crashing the host.
- Periodic Execution: For timer-based services, prefer
PeriodicTimer (available in .NET 6+) over Task.Delay to prevent drift and handle cancellation more efficiently.
2. Implementation Pattern
The following implementation demonstrates a resilient worker with scoped resolution, structured logging, metrics, and a resilience pipeline.
using System.Diagnostics;
using System.Diagnostics.Metrics;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Polly;
using Polly.Retry;
namespace Codcompass.Workers;
/// <summary>
/// Production-grade background service with resilience, scoping, and observability.
/// </summary>
public class ResilientWorker : BackgroundService
{
private readonly IServiceScopeFactory _scopeFactory;
private readonly ILogger<ResilientWorker> _logger;
private readonly ResiliencePipeline _pipeline;
private readonly Meter _meter;
private readonly Counter<long> _processedCount;
private readonly Histogram<double> _processingDuration;
public ResilientWorker(
IServiceScopeFactory scopeFactory,
ILogger<ResilientWorker> logger,
ResiliencePipeline pipeline,
Meter meter)
{
_scopeFactory = scopeFactory;
_logger = logger;
_pipeline = pipeline;
_meter = meter;
// Initialize metrics
_processedCount = meter.CreateCounter<long>("worker.processed.total", "Total items processed");
_processingDuration = meter.CreateHistogram<double>("worker.processing.duration", "ms", "Processing duration");
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
_logger.LogInformation("ResilientWorker starting execution.");
// Use PeriodicTimer for drift-free periodic execution
using var timer = new PeriodicTimer(TimeSpan.FromSeconds(10));
while (!stoppingToken.IsCancellationRequested && await timer.WaitForNextTickAsync(stoppingToken))
{
try
{
await ProcessBatchAsync(stoppingToken);
}
catch (OperationCanceledException)
{
// Expected during shutdown
break;
}
catch (Exception ex)
{
// Catch-all to prevent host crash; log and continue
_logger.LogError(ex, "Unhandled exception in worker loop. Continuing execution.");
// Backoff to prevent tight error loops
await Task.Delay(TimeSpan.FromSeconds(30), stoppingToken);
}
}
_logger.LogInformation("ResilientWorker execution loop terminated.");
}
private async Task ProcessBatchAsync(CancellationToken stoppingToken)
{
var scope = _scopeFactory.CreateAsyncScope();
try
{
// Resolve scoped service within the scope
var processor = scope.ServiceProvider.GetRequiredService<IItemProcessor>();
var sw = Stopwatch.StartNew();
// Execute business logic with resilience pipeline
await _pipeline.ExecuteAsync(async ct => await processor.ProcessAsync(ct), stoppingToken);
sw.Stop();
// Record metrics
_processedCount.Add(1);
_processingDuration.Record(sw.ElapsedMilliseconds);
_logger.LogDebug("Batch processed successfully in {Duration}ms.", sw.ElapsedMilliseconds);
}
finally
{
// Ensure scope disposal releases resources immediately
await scope.DisposeAsync();
}
}
}
3. Registration and Configuration
Register the service and configure resilience in Program.cs.
var builder = Host.CreateApplicationBuilder(args);
// Register scoped services
builder.Services.AddScoped<IItemProcessor, ItemProcessor>();
// Configure Resilience Pipeline
builder.Services.AddResiliencePipeline("worker-pipeline", pipelineBuilder =>
{
pipelineBuilder
.AddRetry(new RetryStrategyOptions
{
BackoffType = DelayBackoffType.Exponential,
MaxRetryAttempts = 3,
Delay = TimeSpan.FromSeconds(1),
ShouldHandle = new PredicateBuilder().Handle<DbException>()
})
.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
HandledExceptions = [typeof(DbException)],
FailureRatio = 0.4,
SamplingDuration = TimeSpan.FromSeconds(30),
MinimumThroughput = 10,
BreakDuration = TimeSpan.FromSeconds(15)
});
});
// Register Background Service
builder.Services.AddHostedService<ResilientWorker>();
// Configure OpenTelemetry/Metrics
builder.Services.AddOpenTelemetry()
.WithMetrics(b => b.AddMeter("Codcompass.Workers"));
var host = builder.Build();
host.Run();
4. Graceful Shutdown Logic
The generic host calls StopAsync when shutdown is requested. BackgroundService passes the cancellation token to ExecuteAsync. The implementation must:
- Check
stoppingToken.IsCancellationRequested in loops.
- Await operations that respect the token (e.g.,
Task.Delay(token), HttpClient.SendAsync(request, token)).
- Avoid fire-and-forget tasks during shutdown.
- Complete in-flight work if possible, but prioritize termination to meet orchestration deadlines.
Pitfall Guide
1. Capturing Scoped Services
Mistake: Injecting DbContext or other scoped services directly into the ResilientWorker constructor.
Impact: The background service is a singleton. The scoped service becomes a singleton, holding database connections and tracking state indefinitely, leading to memory leaks and stale data.
Fix: Inject IServiceScopeFactory and create scopes within ExecuteAsync.
2. Ignoring Cancellation Tokens
Mistake: Using Task.Delay(1000) without passing the token, or performing long-running synchronous operations.
Impact: The application cannot shut down gracefully. Docker/Kubernetes kill the container after a timeout, potentially corrupting data or dropping in-flight messages.
Fix: Always pass stoppingToken to delay methods and async I/O operations.
3. Unhandled Exceptions Crashing the Host
Mistake: Allowing exceptions in ExecuteAsync to propagate uncaught.
Impact: The generic host treats unhandled exceptions in IHostedService as fatal, terminating the process.
Fix: Wrap loop bodies in try-catch. Log exceptions and implement backoff strategies. Only throw if the service is in an unrecoverable state.
4. Sync-over-Async Blocking
Mistake: Calling .Result or .Wait() on async operations within the background service.
Impact: Thread pool starvation. The background service blocks threads waiting for I/O, reducing throughput for other services and potentially deadlocking the application.
Fix: Use await consistently. Refactor legacy sync code to async equivalents.
5. Missing Health Checks
Mistake: Relying solely on process existence for health monitoring.
Impact: Orchestrators restart healthy services thinking they are dead, or fail to restart hung services.
Fix: Implement IHealthCheck. Check internal state, such as "last successful processing time" or "queue depth," rather than just "is the service running."
6. Resource Leaks in Disposables
Mistake: Creating disposables (e.g., HttpClient, Stream) inside the loop without disposal.
Impact: Handle exhaustion and memory pressure.
Fix: Use using statements or await using for async disposables. Reuse HttpClient via IHttpClientFactory.
7. Over-Engineering Simple Tasks
Mistake: Using a full BackgroundService with complex resilience for a task that runs once at startup or requires cron scheduling.
Impact: Unnecessary complexity and resource usage.
Fix: Use IHostedService for startup tasks. Use Quartz.NET or Hangfire for complex scheduling, retries, and persistence requirements.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Simple periodic task | BackgroundService with PeriodicTimer | Low overhead, native integration, sufficient for most cases. | Low |
| Message queue consumer | BackgroundService with dedicated client library | Tightly coupled to queue semantics; requires high throughput and manual ack handling. | Medium |
| Complex scheduling/Cron | Quartz.NET or Hangfire | Persistent job store, misfire handling, cron expressions, dashboard. | Medium-High |
| Event-driven processing | Azure Functions / AWS Lambda | Serverless scaling, pay-per-use, managed triggers. | Variable |
| Durable workflows | Durable Functions / Temporal | State persistence, replayability, complex orchestration logic. | High |
Configuration Template
{
"BackgroundServices": {
"ResilientWorker": {
"IntervalSeconds": 10,
"ShutdownTimeoutSeconds": 30,
"HealthCheck": {
"StaleThresholdSeconds": 60,
"Enabled": true
},
"Resilience": {
"Retry": {
"MaxAttempts": 3,
"DelaySeconds": 1,
"BackoffType": "Exponential"
},
"CircuitBreaker": {
"FailureRatio": 0.4,
"SamplingDurationSeconds": 30,
"BreakDurationSeconds": 15
}
}
}
}
}
Quick Start Guide
-
Create Worker Project:
dotnet new worker -n Codcompass.Worker
cd Codcompass.Worker
-
Add Dependencies:
dotnet add package Microsoft.Extensions.Resilience
dotnet add package OpenTelemetry.Exporter.Prometheus.AspNetCore
-
Implement Service:
Replace Worker.cs with the ResilientWorker pattern from the Core Solution. Inject IServiceScopeFactory, ILogger, and ResiliencePipeline.
-
Configure Program.cs:
Register ResiliencePipeline, scoped services, and the hosted service. Add OpenTelemetry metrics configuration.
-
Run and Verify:
dotnet run
Verify logs show startup, periodic processing, and metrics exposure. Test shutdown with Ctrl+C to confirm graceful termination.