How Far Can Kestrel Actually Go?

Kestrel is one of the reasons modern ASP.NET Core feels so different from the old .NET web stack. You can put a small Minimal API in front of it, run a load test, and get numbers that would have sounded unrealistic years ago. It is fast, lightweight, cross-platform and built directly into ASP.NET Core. It supports HTTP/1.1, HTTP/2, HTTP/3, HTTPS, WebSockets, gRPC, SignalR and the normal middleware pipeline most of us use every day. That can make Kestrel look like the whole performance story. In practice, Kestrel is the server accepting connections, parsing HTTP, handling protocol details and passing work into your application.

So really the question should be this, how far can Kestrel go before the rest of the system becomes the limiting factor? The answer is further than most business applications will ever need, but only if you respect the layers around it. Kestrel is capable of handling a serious amount of traffic. A normal production system usually falls over somewhere else first.

Kestrel is the front door

Kestrel sits at the boundary between the network and your ASP.NET Core application. It accepts connections, handles HTTP protocol work, applies configured limits and gives the request to the ASP.NET Core pipeline. After that, your application decides how expensive the request becomes.

A clean model.

In a small app, you might collapse some of those boxes. Kestrel can be used directly as an internet-facing server, and thats a supported hosting model. In many real production systems, you still put something in front of it. That front layer might be Azure Front Door, Application Gateway, Nginx, Envoy, YARP, an AKS ingress controller or a platform load balancer. That doesnt mean Kestrel is weak. It means there are jobs you often want handled before traffic reaches your application process. TLS termination, WAF rules, DDoS protection, request filtering, host routing, load balancing, connection draining and certificate management are infrastructure concerns as much as application concerns.

If Kestrel is the only thing exposed, Kestrel owns the whole public surface. If a proxy sits in front, the proxy can absorb some of that responsibility and forward a cleaner stream of requests into the app.

What Kestrel is actually good at

Kestrel is optimised for the repetitive work that has to happen for every HTTP server. Accepting connections. Reading bytes. Parsing requests. Writing responses. Supporting modern HTTP protocols. Handling keep-alive connections. Integrating with the ASP.NET Core pipeline without dragging in an old heavyweight hosting model. That last point is easy to miss. Kestrel is part of the ASP.NET Core hosting model. It works with endpoint routing, dependency injection, and the normal deployment model you use for .NET services.

A tiny endpoint shows how little code you need above it.

var builder = WebApplication.CreateSlimBuilder(args);

builder.WebHost.ConfigureKestrel(options =>
{
    options.AddServerHeader = false;
});

var app = builder.Build();

app.MapGet("/ping", () => Results.Text("ok"));

app.Run();

That endpoint can be very fast because the application is barely doing anything. It allocates very little, does no database work, performs no auth, writes no verbose logs and returns a tiny response. Kestrel will usually have plenty of headroom in that test.

Now compare it with a normal production endpoint.

app.MapPost("/orders", async (
    CreateOrderRequest request,
    IUserContext userContext,
    IValidator<CreateOrderRequest> validator,
    AppDbContext dbContext,
    ILogger<Program> logger,
    CancellationToken stopToken) =>
{
    var validationResult = await validator.ValidateAsync(request, stopToken);

    if (!validationResult.IsValid)
    {
        return Results.ValidationProblem(validationResult.ToDictionary());
    }

    var order = new Order
    {
        CustomerId = userContext.CustomerId,
        Reference = request.Reference,
        Amount = request.Amount,
        CreatedAtUtc = DateTimeOffset.UtcNow
    };

    dbContext.Orders.Add(order);

    await dbContext.SaveChangesAsync(stopToken);

    logger.LogInformation("Created order {OrderId}", order.Id);

    return Results.Created($"/orders/{order.Id}", new { order.Id });
});

This endpoint is doing real work. It validates the request, resolves scoped services, uses EF Core, writes to a database, allocates response objects and emits logs. None of that is wrong. It just means a load test is no longer measuring Kestrel on its own. It is measuring the whole request path. That distinction saves you from chasing the wrong problem.

The first wall is usually connection pressure

At low traffic, connection handling is invisible. At high traffic, connection shape starts to be important. HTTP/1.1, HTTP/2 and HTTP/3 behave differently under load. HTTP/1.1 relies heavily on connection reuse, but a single connection generally handles one active request at a time. HTTP/2 multiplexes many concurrent streams over one connection, which can reduce connection overhead but introduce its own flow-control and stream limit concerns. HTTP/3 uses QUIC over UDP, removes TCP-level head-of-line blocking and can help on mobile or lossy networks, but it also depends on platform, firewall, router and proxy support.

This is why "requests per second" is too vague on its own. Ten thousand requests per second over a small number of warm HTTP/2 connections is very different from ten thousand requests per second with constant new TLS handshakes over short-lived HTTP/1.1 connections.

A better load test describes the traffic shape.

Requests per second
Concurrent connections
Requests per connection
Protocol version
TLS enabled or disabled
Payload size
Response size
Keep-alive behaviour
Client location
Network path

You can run an API that looks excellent with keep-alive enabled and then watch it struggle when clients constantly open new connections. You can run a service that behaves well on HTTP/2 and then discover that a proxy downgraded everything to HTTP/1.1. You can enable HTTP/3 and still find that much of your traffic uses HTTP/1.1 or HTTP/2 because of client and network support.

Kestrel gives you the protocol support. The architecture decides whether the traffic reaches Kestrel in a healthy shape.

Kestrel limits are guardrails

A common mistake is treating server limits as restrictions you remove when traffic grows. In reality, good limits protect the process. Kestrel has configurable limits for open connections, upgraded connections such as WebSockets, request body size, request headers, keep-alive timeout and other protocol-specific behaviours. Leaving everything effectively unlimited can be dangerous because the app process becomes the place where every bad traffic pattern gets converted into memory pressure, socket pressure or thread pressure.

A production service should normally set limits intentionally.

using Microsoft.AspNetCore.Server.Kestrel.Core;

var builder = WebApplication.CreateSlimBuilder(args);

builder.WebHost.ConfigureKestrel(options =>
{
    options.AddServerHeader = false;

    options.Limits.MaxConcurrentConnections = 20_000;
    options.Limits.MaxConcurrentUpgradedConnections = 5_000;

    options.Limits.KeepAliveTimeout = TimeSpan.FromSeconds(60);
    options.Limits.RequestHeadersTimeout = TimeSpan.FromSeconds(15);

    options.Limits.MaxRequestBodySize = 1 * 1024 * 1024;

    options.Limits.Http2.MaxStreamsPerConnection = 100;
    options.Limits.Http2.InitialConnectionWindowSize = 128 * 1024;
    options.Limits.Http2.InitialStreamWindowSize = 96 * 1024;
});

Those numbers are examples, not defaults you should copy blindly. The right values depend on workload, payload size, node size, memory, client behaviour and whether the service handles short requests, uploads, streaming, WebSockets or gRPC. The important point is that limits are part of resilience. If you accept infinite connections, huge bodies, slow clients and unlimited upgraded connections, Kestrel may faithfully accept work that the rest of your system has no chance of surviving.

HTTP/3 is useful, but it is not a free speed button

HTTP/3 is one of the more interesting parts of modern Kestrel. It uses QUIC rather than TCP, and QUIC combines transport and encryption handshakes. It can reduce connection setup cost, avoid TCP-level head-of-line blocking and behave better when networks are lossy or clients move between networks.

For Kestrel, HTTP/3 also has practical requirements. It depends on MsQuic and platform support. It requires HTTPS. It should usually be enabled alongside HTTP/1.1 and HTTP/2 because not every client, router, firewall or proxy path will support it cleanly.

A reasonable Kestrel endpoint configuration.

using Microsoft.AspNetCore.Server.Kestrel.Core;

var builder = WebApplication.CreateSlimBuilder(args);

builder.WebHost.ConfigureKestrel(options =>
{
    options.ListenAnyIP(5001, listenOptions =>
    {
        listenOptions.Protocols = HttpProtocols.Http1AndHttp2AndHttp3;
        listenOptions.UseHttps();
    });
});

var app = builder.Build();

app.MapGet("/", () => "Hello over HTTP/1.1, HTTP/2 or HTTP/3");

app.Run();

That configuration says the service can speak all three major HTTP versions. It does not guarantee every request will use HTTP/3. The first request normally arrives over HTTP/1.1 or HTTP/2, then the alt-svc header can tell the client that HTTP/3 is available. Some clients will upgrade. Some will not. Some infrastructure paths will block UDP or fail to pass HTTP/3 traffic properly. So HTTP/3 should be treated as an option you test under your own traffic pattern. It can help, especially for certain client and network conditions. It can also add complexity if your load balancer, ingress or observability tooling does not handle it well.

TLS changes the numbers

A local plaintext benchmark can make almost anything look impressive. Real public traffic usually uses TLS, and TLS has a cost. TLS affects connection setup, CPU usage, certificate configuration, ALPN protocol negotiation and sometimes where traffic can be inspected or routed. If the load balancer terminates TLS, Kestrel may receive plain HTTP from the trusted internal network. If Kestrel terminates TLS itself, the .NET process handles that work directly. Both are valid choices, but they are different designs.

A common production layout.

A different layout is this.

The first model centralises certificate handling and may simplify application deployment. The second keeps end-to-end TLS closer to the application process and may be useful in some zero-trust or platform-specific designs. At high scale, you should test the model you actually run. Plain HTTP numbers from a laptop benchmark tell you very little about TLS termination, ALPN, certificate chains, connection reuse and real network latency.

A reverse proxy can make Kestrel easier to scale

Kestrel can be internet facing, but many serious deployments still use a reverse proxy or managed ingress in front of it. That front layer can handle host routing, port sharing, TLS certificates, static filtering, WAF rules, connection draining, client IP forwarding, gzip or Brotli decisions, request buffering policies, blue-green routing, canary traffic and platform specific health checks. Kestrel then receives traffic that has already passed through a controlled boundary. The catch is that reverse proxies also introduce failure types. They can buffer request bodies and hide backpressure. They can set lower timeouts than your app expects. They can downgrade protocols. They can remove headers. They can break WebSockets. They can pass the wrong scheme and client IP unless forwarded headers are configured.

ASP.NET Core needs to know when it is behind a proxy.

using Microsoft.AspNetCore.HttpOverrides;

var builder = WebApplication.CreateSlimBuilder(args);

builder.Services.Configure<ForwardedHeadersOptions>(options =>
{
    options.ForwardedHeaders =
        ForwardedHeaders.XForwardedFor |
        ForwardedHeaders.XForwardedProto;

    options.KnownNetworks.Clear();
    options.KnownProxies.Clear();
});

var app = builder.Build();

app.UseForwardedHeaders();

app.MapGet("/client", (HttpContext context) =>
{
    return new
    {
        Scheme = context.Request.Scheme,
        RemoteIp = context.Connection.RemoteIpAddress?.ToString()
    };
});

app.Run();

In a locked-down production setup, you would usually configure known proxies or known networks rather than clearing them broadly. The example shows the shape, not a final security posture. The key point is simple, once a proxy sits in front, Kestrel no longer sees the original internet request directly. Your app must be told which headers to trust, which networks are allowed to set them and how routing should behave.

The app code usually breaks before Kestrel

When a .NET API slows down under load, Kestrel is often the first suspect because it is the visible server. In many cases, Kestrel is just the messenger. Blocking code is one of the fastest ways to damage throughput. Task.Result, Task.Wait(), synchronous database calls, synchronous file IO, long CPU work on request threads and accidental sync-over-async can cause thread pool starvation. Newer .NET versions react better than older ones, but the runtime cannot turn blocking work into scalable async work for you.

This is the kind of endpoint that looks harmless in a code review and ugly under pressure.

app.MapGet("/slow", (IExternalPriceClient client) =>
{
    var price = client.GetPriceAsync().Result;

    return Results.Ok(price);
});

The async version at least gives the runtime a chance to use threads efficiently.

app.MapGet("/prices/{productId:int}", async (
    int productId,
    IExternalPriceClient client,
    CancellationToken stopToken) =>
{
    var price = await client.GetPriceAsync(productId, stopToken);

    return price is null
        ? Results.NotFound()
        : Results.Ok(price);
});

That doesnt make the external dependency fast. It just avoids pinning a thread while the app waits. The same idea applies to database work. A slow query will still be slow when called asynchronously. Async prevents wasted threads, but it does not remove database pressure, bad indexes, lock contention or connection pool exhaustion.

Middleware adds up

Every middleware component sits in the request path and all have a cost. Most of those are fine when used deliberately. Problems start when every endpoint pays for work it doesnt need. A health endpoint used by a load balancer should be cheaper than a customer API endpoint. A public cached read endpoint may not need the same policy stack as a write endpoint. A high throughput internal endpoint might use a completely different route group from the management API.

Minimal APIs make it easy to express those boundaries.

var app = builder.Build();

app.MapGet("/healthz", () => Results.Ok("ok"))
    .DisableAntiforgery();

var publicApi = app.MapGroup("/api/public");

publicApi.MapGet("/catalogue/{id:int}", async (
    int id,
    ICatalogueCache cache,
    CancellationToken stopToken) =>
{
    var item = await cache.GetAsync(id, stopToken);

    return item is null
        ? Results.NotFound()
        : Results.Ok(item);
});

var privateApi = app.MapGroup("/api/private")
    .RequireAuthorization();

privateApi.MapPost("/orders", async (
    CreateOrderRequest request,
    IOrderService service,
    CancellationToken stopToken) =>
{
    var result = await service.CreateAsync(request, stopToken);

    return Results.Created($"/api/private/orders/{result.Id}", result);
});

app.Run();

That kind of separation lets you keep the hot path small without weakening the rest of the app.

Logging can quietly become part of the bottleneck

Logging is useful until it becomes per request allocation and IO pressure. At extreme throughput, logging every request body, serialising large objects into structured logs, creating high cardinality labels or writing synchronously can hurt the API badly. The better pattern is to log outcomes, identifiers and unusual behaviour. Keep normal success path logging cheap. Use metrics for volume and latency. Use traces when you need request level investigation. Use sampling where appropriate.

Source generated logging helps reduce overhead on hot paths.

public static partial class LogMessages
{
    [LoggerMessage(
        EventId = 1001,
        Level = LogLevel.Warning,
        Message = "Rejected request for tenant {TenantId} because the payload was too large")]
    public static partial void RejectedLargePayload(
        this ILogger logger,
        string tenantId);
}

Then call it without building the message yourself.

logger.RejectedLargePayload(request.TenantId);

This is the kind of optimisation that only becomes interesting when an endpoint is genuinely hot. For normal admin screens, readability wins. For a path taking tens or hundreds of thousands of calls per second, allocations and formatting overhead deserve attention.

Response size can beat request count

A million tiny responses and ten thousand large responses stress the system differently. Kestrel might handle the request count, while the network becomes the limit because every response is too large. For example, this endpoint is cheap in routing terms but potentially expensive in payload terms.

app.MapGet("/customers", async (
    AppDbContext dbContext,
    CancellationToken stopToken) =>
{
    var customers = await dbContext.Customers
        .AsNoTracking()
        .ToListAsync(stopToken);

    return Results.Ok(customers);
});

The problem is not Kestrel. The problem is that the endpoint may load too much data, allocate a large object graph, serialise a huge JSON response and push a lot of bytes through the network.

A more controlled version projects the shape and pages the result.

app.MapGet("/customers", async (
    int page,
    int pageSize,
    AppDbContext dbContext,
    CancellationToken stopToken) =>
{
    page = Math.Max(page, 1);
    pageSize = Math.Clamp(pageSize, 1, 100);

    var customers = await dbContext.Customers
        .AsNoTracking()
        .OrderBy(customer => customer.Id)
        .Skip((page - 1) * pageSize)
        .Take(pageSize)
        .Select(customer => new CustomerListItem(
            customer.Id,
            customer.DisplayName))
        .ToListAsync(stopToken);

    return Results.Ok(customers);
});

When people talk about server throughput, they often focus on request count. The network cares about bytes. The serialiser cares about object shape. The GC cares about allocations. The client cares about latency. You need all of those views.

WebSockets and upgraded connections are a different workload

Kestrel can handle WebSockets and other upgraded connections, but persistent connections change the economics. A normal HTTP request arrives, does work and leaves. A WebSocket connection stays open. That means memory, connection tracking, heartbeat behaviour, proxy timeouts, reconnect storms and client backpressure become part of capacity planning.

This is why upgraded connections have a separate Kestrel limit.

builder.WebHost.ConfigureKestrel(options =>
{
    options.Limits.MaxConcurrentUpgradedConnections = 10_000;
});

That value should be based on actual memory per connection, message rate and node size. Ten thousand mostly idle WebSockets and ten thousand WebSockets receiving constant fan-out messages are completely different workloads. SignalR makes this easier to build, but it does not erase the cost of holding connections. At higher connection counts, Azure SignalR Service or another managed real time gateway can make more sense than asking every API pod to hold persistent connections itself.

Containers make the limits more visible

Kestrel might be capable of handling more work than your container is allowed to use. If the container has a small CPU limit, high request volume can produce throttling even when the node has spare CPU. If the memory limit is low, GC behaviour can change because the process has less room to work with. If the pod is killed under memory pressure, the problem may look like random application instability when the real cause is capacity.

A production Kubernetes deployment usually needs resource requests and limits that reflect the workload.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hot-api
spec:
  replicas: 6
  selector:
    matchLabels:
      app: hot-api
  template:
    metadata:
      labels:
        app: hot-api
    spec:
      containers:
        - name: api
          image: example.azurecr.io/hot-api:1.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: "1000m"
              memory: "512Mi"
            limits:
              cpu: "2000m"
              memory: "1Gi"
          readinessProbe:
            httpGet:
              path: /healthz/ready
              port: 8080
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /healthz/live
              port: 8080
            periodSeconds: 10

The exact values are workload specific. The important bit is to test with the same CPU and memory limits you intend to run. A local test on a developer machine does not tell you how a two-CPU container behaves under pod networking, service mesh sidecars, ingress hops and real TLS.

Horizontal scale changes the problem

A single powerful node can be useful, but high throughput systems usually scale Kestrel horizontally. Ten pods each handling 20,000 requests per second is easier to reason about than one process trying to handle 200,000 requests per second on its own. Horizontal scaling introduces its own issues. Load distribution must be even. Health checks must remove bad instances quickly. Rolling deployments must drain connections. Sticky sessions may be required for some real-time workloads. Shared dependencies must scale with the API tier. A database that could handle one pod may collapse when twenty pods all increase concurrency at the same time.

The shape becomes this.

If every pod is allowed to open hundreds of database connections, scaling the API tier can overload the database faster. If every pod writes logs aggressively, the logging pipeline can become the bottleneck. If every pod calls the same downstream API, you can trigger rate limits or dependency failure.

Kestrel can scale out nicely. Your shared dependencies need the same attention.

Backpressure beats optimistic overload

A good high throughput service refuses excess work before it becomes unhealthy. Kestrel limits are one layer. Rate limiting is another. Queue depth checks, circuit breakers, bulkheads and dependency health checks also help. The goal is to stop the service from accepting work it cant complete.

A simple rate limiter can protect an endpoint from traffic bursts.

using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;

var builder = WebApplication.CreateSlimBuilder(args);

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("hot-path", limiter =>
    {
        limiter.PermitLimit = 10_000;
        limiter.Window = TimeSpan.FromSeconds(1);
        limiter.QueueLimit = 0;
        limiter.AutoReplenishment = true;
    });
});

var app = builder.Build();

app.UseRateLimiter();

app.MapGet("/hot", () => Results.Ok("ok"))
    .RequireRateLimiting("hot-path");

app.Run();

This example is deliberately simple. Real systems often rate limit per tenant, per API key, per route, per region or per product tier. The important design choice is that the service has a controlled failure mode. Returning 429 Too Many Requests is better than accepting everything and timing out half the fleet.

How to measure Kestrel under real pressure

A useful load test needs more than one number. Start with the smallest endpoint to establish a baseline. Then add the real middleware. Then add JSON. Then add auth. Then add dependency calls. Then add the database. Each stage tells you where the cost appears.

A simple benchmark.

bombardier -c 1000 -d 60s https://localhost:5001/ping

For Linux-based testing, wrk is also useful.

wrk -t16 -c1000 -d60s https://api.example.com/ping

The result should be treated carefully. If the load generator CPU is maxed out, you are benchmarking the client. If the test runs from the same machine as the API, you are hiding real network behaviour. If TLS is disabled, you are testing a different system. If the test only hits /ping, you have measured a protocol and routing baseline rather than the application.

During the test, watch the process and the platform.

dotnet-counters monitor --process-id <pid> System.Runtime Microsoft.AspNetCore.Hosting

For deeper investigations, collect traces rather than guessing.

dotnet-trace collect --process-id <pid>

A trace can show where time is being spent. Thats usually more useful than arguing about whether Kestrel, EF Core, JSON or the database is the real problem.

What I would tune first

I wouldnt start by tweaking obscure Kestrel settings. I would start by proving where the bottleneck lives. The first pass is to keep the request path small. Remove unnecessary middleware from hot endpoints. Avoid sync-over-async. Keep response objects tight. Use source generated JSON for known hot models. Avoid per-request logging noise. Make database calls explicit and measured. Set sane Kestrel limits. Put a clear edge or ingress layer in front. Use rate limiting before the app gets sick.

The second pass is protocol and infrastructure. Confirm whether traffic is HTTP/1.1, HTTP/2 or HTTP/3. Check whether TLS terminates at the edge, the proxy or Kestrel. Verify keep-alive behaviour. Check reverse proxy timeouts. Confirm forwarded headers. Make sure health checks and connection draining work. Check container CPU throttling and memory limits.

The third pass is runtime diagnostics. Watch allocation rate, GC, thread pool queue length, active requests, failed requests, socket usage, network throughput and downstream dependency latency. Once you know what is actually failing, the optimisation work becomes far less random.

The honest ceiling

Kestrel can go very far. For a small endpoint that does almost nothing, it can handle impressive throughput, especially when scaled across multiple instances. For a real business endpoint, the ceiling is usually set by the work behind Kestrel. A read endpoint backed by memory cache can go much further than one backed by SQL on every request. A tiny JSON response can go much further than a large object graph. An async endpoint can go much further than one that blocks request threads. HTTP/2 or HTTP/3 traffic with reused connections can behave very differently from constant new HTTP/1.1 connections. A well-configured ingress can help, while a badly configured one can hide the real bottleneck.

The useful conclusion is more specific than "Kestrel is fast". We already know that. Kestrel gives .NET a strong front door, but it will happily expose every poor decision behind that door once traffic gets serious. If you want to know how far Kestrel can actually go, build a thin baseline endpoint and test it. Then add the real production path one piece at a time. The moment the numbers collapse, you have found the part of the system that needs your attention.

Most of the time, it wont be Kestrel.

Microsoft Learn - Kestrel web server in ASP.NET Core

Microsoft Learn - Configure options for the ASP.NET Core Kestrel web server

Microsoft Learn - Configure endpoints for the ASP.NET Core Kestrel web server

Microsoft Learn - Use HTTP/3 with the ASP.NET Core Kestrel web server

Microsoft Learn - ASP.NET Core best practices

Microsoft Learn - Debug ThreadPool starvation