standard ASP.NET Core diagnostics and metrics pipelines.
Core Solution
Implementing production-grade rate limiting in ASP.NET Core requires three architectural decisions: policy definition, partitioning strategy, and state persistence. The framework provides a declarative API that separates these concerns cleanly.
Step 1: Register Rate Limiting Services
Add the middleware to the DI container and define named policies. Policies are reusable templates that specify the algorithm, window size, permit count, and queue behavior.
using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
// Global default policy
options.AddPolicy("global", httpContext =>
RateLimitPartition.GetFixedWindowLimiter(
partitionKey: httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown",
factory: _ => new FixedWindowRateLimitOptions
{
AutoReplenishment = true,
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1)
}));
});
var app = builder.Build();
app.UseRateLimiter();
The framework supports multiple limiter types. Choose based on traffic characteristics:
- FixedWindow: Predictable, simple, but prone to boundary spikes.
- SlidingWindow: Smoother rate enforcement, slightly higher memory overhead.
- TokenBucket: Ideal for bursty traffic with sustained average limits.
- ConcurrencyLimiter: Strictly limits simultaneous active requests.
options.AddPolicy("api-billing", httpContext =>
{
var apiKey = httpContext.Request.Headers["X-API-Key"].ToString();
return RateLimitPartition.GetTokenBucketLimiter(
partitionKey: apiKey,
factory: _ => new TokenBucketRateLimitOptions
{
AutoReplenishment = true,
TokenLimit = 50,
TokensPerPeriod = 5,
ReplenishmentPeriod = TimeSpan.FromSeconds(1)
});
});
Step 3: Apply Policies to Endpoints
Use endpoint metadata to attach policies selectively. Avoid global application unless the entire surface requires uniform throttling.
app.MapGet("/api/data", async (HttpContext ctx) =>
{
return Results.Ok(new { data = "sensitive payload" });
}).RequireRateLimiting("api-billing");
app.MapPost("/api/upload", async (IFormFile file, HttpContext ctx) =>
{
// Handle upload
return Results.Ok();
}).RequireRateLimiting("concurrent-uploads");
Step 4: Implement Distributed State Persistence
In-memory limiters reset on node restarts and diverge across scaled instances. Replace with a distributed store for production consistency.
builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration.GetConnectionString("Redis");
});
builder.Services.AddRateLimiter(options =>
{
options.OnRejected = async (context, cancellationToken) =>
{
context.HttpContext.Response.Headers.RetryAfter =
((int)context.TryGetMetadata("RetryAfter")?.TotalSeconds ?? 30).ToString();
await context.HttpContext.Response.WriteAsync(
"Rate limit exceeded. Please retry later.",
cancellationToken);
};
// Hook into distributed store via custom IAsyncResourceLimiter or use
// Microsoft.AspNetCore.RateLimiting.Redis package for seamless integration
});
Architecture Rationale
- Early Pipeline Placement:
app.UseRateLimiter() must execute before authentication, routing, and endpoint execution. This prevents resource allocation for rejected requests.
- Partitioning by Context: IP addresses alone fail behind NAT/CDNs. Partition by API key, JWT subject, or tenant ID when available. Fall back to IP + User-Agent hash for unauthenticated traffic.
- Queue vs Rejection: Use
QueueProcessingOrder.OldestFirst with QueueLimit to absorb burst traffic instead of immediately rejecting. This improves client experience and reduces retry storms.
- Metrics Integration: Rate limit rejections should emit structured logs and OpenTelemetry spans. Correlate limit hits with authentication failures to detect credential stuffing patterns.
Pitfall Guide
1. Partitioning by IP in CDN/Proxy Environments
Shared corporate networks, mobile carriers, and CDN edge nodes route thousands of users through a single public IP. Rate limiting by RemoteIpAddress will block entire organizations. Always extract forwarded headers (X-Forwarded-For, CF-Connecting-IP) and validate proxy trust lists before partitioning.
2. In-Memory State in Scaled Deployments
Each node maintains independent counters. A user hitting Node A for 90 requests and Node B for 90 requests bypasses a 100-request limit. Horizontal scaling without distributed state guarantees inconsistent enforcement and unpredictable behavior during rolling deployments.
Clients that receive 429 without Retry-After will immediately retry, creating a thundering herd effect. Always populate the header with the exact window reset time. Configure exponential backoff awareness in client SDKs.
4. Applying Limits After Authentication/Authorization
If rate limiting executes after middleware that validates tokens or queries databases, attackers still consume CPU, memory, and I/O for every rejected request. The limiter must sit at the earliest possible pipeline stage to protect downstream resources.
5. Overcomplicating with Custom Middleware
Teams frequently build custom IAsyncActionFilter or Middleware classes that parse request bodies, cache counters in MemoryCache, and manually return 429. This duplicates framework functionality, bypasses optimized limiter algorithms, and introduces race conditions under high concurrency.
6. Blind Monitoring
Rate limit hits are silent by default. Without structured logging or metrics, teams cannot distinguish between legitimate traffic spikes and abuse campaigns. Missing this visibility delays incident response and obscures capacity planning.
Best Practices from Production
- Use
TokenBucket for public APIs with bursty usage patterns.
- Implement tiered limits: unauthenticated (strict), authenticated (moderate), enterprise (relaxed).
- Combine rate limiting with circuit breakers for downstream dependency protection.
- Test limit boundaries under load using realistic traffic profiles, not synthetic constant-rate generators.
- Rotate partition keys gracefully; avoid hard dependencies on mutable headers.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single-node staging or low-traffic internal API | In-memory FixedWindow | Zero infrastructure dependency, sub-0.2ms overhead | Negligible |
| Public API with bursty traffic & CDN | TokenBucket + IP/User-Agent partition | Absorbs spikes, prevents boundary clumping | Low (compute) |
| Multi-tenant SaaS with authenticated users | SlidingWindow + JWT Subject partition | Consistent per-user limits, scales horizontally | Medium (Redis/SQL) |
| Enterprise gateway handling 50k+ RPS | TokenBucket + Distributed Redis store | Predictable throughput, cluster-wide consistency | Medium-High (infrastructure) |
| File upload / long-polling endpoints | ConcurrencyLimiter + QueueLimit | Prevents thread pool exhaustion, buffers active connections | Low (memory) |
Configuration Template
// Program.cs
using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
options.OnRejected = async (context, cancellationToken) =>
{
var retryAfter = context.TryGetMetadata("RetryAfter") ?? TimeSpan.FromSeconds(30);
context.HttpContext.Response.Headers.RetryAfter = ((int)retryAfter.TotalSeconds).ToString();
await context.HttpContext.Response.WriteAsync(
$"{{\"error\":\"rate_limit_exceeded\",\"retry_after\":{retryAfter.TotalSeconds}}}",
cancellationToken);
};
// Unauthenticated: strict, IP-based
options.AddPolicy("unauthenticated", httpContext =>
{
var ip = httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(ip, _ => new()
{
AutoReplenishment = true,
PermitLimit = 60,
Window = TimeSpan.FromMinutes(1)
});
});
// Authenticated: moderate, user-based
options.AddPolicy("authenticated", httpContext =>
{
var userId = httpContext.User.FindFirst("sub")?.Value ?? "anonymous";
return RateLimitPartition.GetTokenBucketLimiter(userId, _ => new()
{
AutoReplenishment = true,
TokenLimit = 300,
TokensPerPeriod = 10,
ReplenishmentPeriod = TimeSpan.FromSeconds(1)
});
});
// Internal services: high throughput, key-based
options.AddPolicy("internal", httpContext =>
{
var serviceKey = httpContext.Request.Headers["X-Service-Key"].ToString();
return RateLimitPartition.GetSlidingWindowLimiter(serviceKey, _ => new()
{
AutoReplenishment = true,
PermitLimit = 1000,
Window = TimeSpan.FromMinutes(5),
SegmentsPerWindow = 5
});
});
});
var app = builder.Build();
// MUST be early in pipeline
app.UseRateLimiter();
app.MapGet("/api/public", () => Results.Ok()).RequireRateLimiting("unauthenticated");
app.MapGet("/api/user", () => Results.Ok()).RequireRateLimiting("authenticated");
app.MapPost("/api/internal/sync", () => Results.Ok()).RequireRateLimiting("internal");
app.Run();
Quick Start Guide
- Add the package: If using .NET 8+, the middleware is included in the shared framework. For .NET 7 or earlier, run
dotnet add package Microsoft.AspNetCore.RateLimiting.
- Register services: Call
builder.Services.AddRateLimiter() and define at least one named policy with partitioning logic.
- Insert middleware: Add
app.UseRateLimiter() immediately after app.UseRouting() and before authentication/authorization middleware.
- Attach to endpoints: Use
.RequireRateLimiting("policyName") on controllers, minimal APIs, or Razor pages.
- Validate: Run
curl -I http://localhost:5000/api/public repeatedly until 429 is returned. Verify Retry-After header and confirm downstream resources are not consumed on rejection.