hree pillars: registration, execution, and response mapping.
Step 1: Install and Register the Health Checks Pipeline
Add the package to your project:
dotnet add package Microsoft.Extensions.Diagnostics.HealthChecks
In Program.cs, register the health checks service and attach dependencies:
builder.Services.AddHealthChecks()
.AddCheck<DatabaseHealthCheck>("db", tags: new[] { "dependency" })
.AddCheck<CacheHealthCheck>("cache", tags: new[] { "dependency" })
.AddUrlCheck("https://api.external-service.com/heartbeat", "external-api", tags: new[] { "external" });
Step 2: Implement Custom IHealthCheck Classes
Custom checks must implement IHealthCheck and respect cancellation tokens. Avoid blocking I/O. Use HttpClient with explicit timeouts or IDbConnection with command timeouts.
public class DatabaseHealthCheck : IHealthCheck
{
private readonly IDbConnection _connection;
private readonly TimeSpan _timeout;
public DatabaseHealthCheck(IDbConnection connection, IConfiguration config)
{
_connection = connection;
_timeout = TimeSpan.FromSeconds(config.GetValue<int>("HealthChecks:DbTimeout", 3));
}
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
{
using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
cts.CancelAfter(_timeout);
try
{
await _connection.ExecuteAsync("SELECT 1", commandTimeout: (int)_timeout.TotalSeconds);
return HealthCheckResult.Healthy("Database responsive");
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
return HealthCheckResult.Unhealthy("Database check failed", ex);
}
}
}
Orchestration platforms require distinct paths for liveness, readiness, and startup probes. ASP.NET Core maps these using predicates that filter checks by tag or status.
var app = builder.Build();
// Startup probe: runs once during initialization, ignores degraded state
app.MapHealthChecks("/healthz/startup", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("startup"),
ResponseWriter = WriteResponseAsync
});
// Liveness probe: checks process state, ignores dependencies
app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
Predicate = _ => false, // Only runs built-in process checks
ResponseWriter = WriteResponseAsync
});
// Readiness probe: checks all dependencies, blocks traffic if unhealthy
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
Predicate = _ => true,
ResponseWriter = WriteResponseAsync
});
Step 4: Implement a Custom Response Writer
Default JSON output is verbose and not optimized for orchestrators. A custom writer returns minimal payloads and maps status codes correctly.
private static Task WriteResponseAsync(HttpContext context, HealthReport report)
{
context.Response.ContentType = "application/json";
var statusCode = report.Status switch
{
HealthStatus.Healthy => StatusCodes.Status200OK,
HealthStatus.Degraded => StatusCodes.Status200OK, // Or 503 depending on orchestration policy
HealthStatus.Unhealthy => StatusCodes.Status503ServiceUnavailable,
_ => StatusCodes.Status503ServiceUnavailable
};
context.Response.StatusCode = statusCode;
var payload = new
{
status = report.Status.ToString(),
totalDuration = report.TotalDuration.TotalMilliseconds,
checks = report.Entries.Select(e => new
{
name = e.Key,
status = e.Value.Status.ToString(),
duration = e.Value.Duration.TotalMilliseconds,
description = e.Value.Description
})
};
return JsonSerializer.SerializeAsync(context.Response.Body, payload);
}
Architecture Decisions and Rationale
- Separation of Probe Semantics: Liveness, readiness, and startup probes serve different control plane functions. Liveness should never depend on external resources. Readiness should reflect traffic routing capability. Startup should provide a grace window for initialization. Mapping them to distinct paths prevents orchestration misinterpretation.
- Timeout Isolation: Each health check receives a
CancellationToken and an independent timeout. This prevents a slow database from blocking the entire health pipeline. The CancellationTokenSource.CreateLinkedTokenSource pattern ensures cancellation propagates correctly.
- Tag-Based Filtering: Using tags allows dynamic probe composition without duplicating registration logic. The
Predicate delegate evaluates checks at runtime, enabling lightweight liveness probes and comprehensive readiness probes from the same service.
- Response Minimization: Orchestrators parse status codes, not JSON payloads. Returning only essential metadata reduces serialization overhead and network transfer. Custom writers also enable compliance with internal API contracts or security scanners.
- Dependency Caching: For expensive checks (e.g., third-party APIs), cache results for 5-10 seconds using
IMemoryCache or IDistributedCache. Health checks should reflect recent state, not trigger real-time requests on every probe.
Pitfall Guide
1. Synchronous Blocking Calls in Async Context
Developers frequently use .Result or .Wait() inside health checks. This deadlocks the ASP.NET Core thread pool under load, causing all requests to queue. Health checks must be fully asynchronous and respect CancellationToken. Always use await and configure command/HTTP timeouts explicitly.
2. Monolithic Dependency Evaluation
A single health check that validates the database, cache, message queue, and external API creates a broad failure surface. If the cache is temporarily unreachable, the entire service appears unhealthy. Split checks by dependency, tag them, and compose them via predicates. Use HealthCheckResult.Degraded for non-critical failures to allow traffic routing while signaling partial availability.
3. Ignoring Startup Grace Periods
Applications often take 10-30 seconds to initialize connection pools, load configuration, or warm up caches. If readiness probes start immediately, the orchestrator marks the pod unhealthy and restarts it before it can serve traffic. Implement a startup probe with a higher failure threshold and longer initial delay. Map it to /healthz/startup and exclude it from readiness predicates.
4. Hardcoded Timeouts and Unbounded Retries
Health checks that retry indefinitely or use default HttpClient timeouts (100 seconds) will stall the middleware pipeline. Configure explicit timeouts via IOptions<HealthCheckOptions> or appsettings.json. Use CancellationToken propagation to ensure cancellation flows through the entire call stack.
5. Returning 200 for Unhealthy States
Some teams return 200 OK with a JSON body indicating failure to avoid triggering orchestrator restarts. This breaks control plane semantics. Kubernetes, ECS, and Consul rely on HTTP status codes to make routing and lifecycle decisions. Return 503 for unhealthy states. If you need to signal degradation without restarting, use 200 with a Degraded status and configure your orchestrator to handle it appropriately.
6. Exposing Health Endpoints Publicly
Health endpoints often leak internal architecture, dependency versions, and connection strings. Restrict paths using routing predicates, host filtering, or middleware. In production, disable /health endpoints in public-facing routes or require internal network access. Use RequireHost or custom middleware to enforce environment-specific visibility.
Health checks are not logging endpoints. Embedding structured logging, metrics emission, or telemetry collection inside the probe path adds latency and couples lifecycle signaling to observability pipelines. Log health check failures separately using a background service or dedicated diagnostic endpoint. Keep probes lean and deterministic.
Production Best Practices
- Cache dependency state for 5-10 seconds using
IMemoryCache to reduce load on upstream systems.
- Implement circuit breaker patterns in health checks for external APIs to prevent cascade failures.
- Use
AddCheck<T>() with scoped/transient lifetimes carefully; prefer singleton checks with injected IServiceProvider for expensive dependencies.
- Validate health check payloads in CI/CD pipelines using integration tests that simulate dependency failures.
- Monitor probe latency separately from application metrics. High health check latency often indicates thread pool starvation or connection pool exhaustion.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single monolith deployment | Unified /health with dependency tags | Simplifies operations; orchestrator restarts are acceptable | Low |
| Kubernetes microservices | Separate /healthz/live, /healthz/ready, /healthz/startup | Aligns with K8s probe semantics; prevents false restarts | Medium |
| High-throughput API gateway | Dependency caching + degraded status routing | Maintains traffic flow during transient failures; reduces probe overhead | Low |
| Legacy migration to cloud | Add startup probe + extend failure threshold | Prevents premature restarts during initialization; smooths migration | Low |
| Multi-region active-active | Distributed cache-backed health state + region-specific predicates | Ensures consistent routing decisions across regions; avoids split-brain | High |
Configuration Template
appsettings.json:
{
"HealthChecks": {
"DbTimeout": 3,
"CacheTimeout": 2,
"ExternalApiTimeout": 5,
"CacheDurationSeconds": 10,
"EnableStartupProbe": true,
"StartupFailureThreshold": 10,
"StartupPeriodSeconds": 30
},
"AllowedHosts": "*"
}
Program.cs (core registration):
builder.Services.AddHealthChecks()
.AddCheck<DatabaseHealthCheck>("db", tags: new[] { "dependency" })
.AddCheck<CacheHealthCheck>("cache", tags: new[] { "dependency" })
.AddCheck<ExternalApiHealthCheck>("external", tags: new[] { "external", "startup" });
var app = builder.Build();
app.MapHealthChecks("/healthz/startup", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("startup"),
ResponseWriter = WriteResponseAsync
});
app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
Predicate = _ => false,
ResponseWriter = WriteResponseAsync
});
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
Predicate = _ => true,
ResponseWriter = WriteResponseAsync
});
app.Run();
Quick Start Guide
- Install the health checks package:
dotnet add package Microsoft.Extensions.Diagnostics.HealthChecks
- Register checks in
Program.cs using AddHealthChecks().AddCheck<T>() and tag by dependency type
- Map three endpoints:
/healthz/startup (initialization), /healthz/live (process state), /healthz/ready (traffic routing)
- Implement a custom
ResponseWriter that returns 200 for healthy/degraded and 503 for unhealthy states
- Configure orchestrator probes to target the correct paths, set appropriate failure thresholds, and enable startup grace periods
Health checks are control plane signals, not diagnostic endpoints. Treat them as such, and your orchestration platform will manage failures predictably rather than reactively.