The GC Wall

A .NET API can be fast, clean and perfectly reasonable at normal traffic levels, then start falling apart when load increases. The strange part is that nothing obvious has changed. The database still looks fine. CPU might only be high in bursts. The endpoint code still looks simple. There are no dramatic exceptions in the logs. Yet p95 and p99 latency start drifting upwards, pods begin using more memory than expected, and the service feels unstable under load.

That is often the point where allocation pressure has become the bottleneck. This is one of the more interesting failure types in .NET because the runtime is doing exactly what it was designed to do. The garbage collector is protecting you from manual memory management. Most of the time, it does that brilliantly. The problem appears when your API creates so much short-lived garbage that the runtime spends too much time cleaning up after every request. At small scale, those allocations are invisible. At serious scale, they become a tax on every core in the system. The mistake is waiting until memory looks broken before caring about allocations. In high-throughput ASP.NET Core systems, allocation rate is a throughput limit. If an endpoint allocates 20 KB per request and you push it to 50,000 requests per second, the service is allocating roughly 1 GB per second. That does not mean the process keeps 1 GB per second forever, but it does mean the garbage collector has a huge amount of work to keep up with. At 100,000 requests per second, that same endpoint is allocating around 2 GB per second. Your business logic may be simple, but your runtime is now running a memory recycling plant at industrial speed.

20 KB per request x 50,000 requests per second = 1,000,000 KB per second
1,000,000 KB per second is roughly 1 GB per second of allocation pressure

This is the GC wall. You dontt hit it because .NET is slow. You hit it because your code is asking the runtime to allocate and collect far more memory than the endpoint appears to need.

What the GC wall looks like

The GC wall rarely starts as an obvious out-of-memory problem. It usually starts as latency. Gen 0 collections become frequent, which may be fine for a while. Some objects survive long enough to move into Gen 1. A smaller number survive into Gen 2. Large buffers, big arrays and large serialised payloads can move into the Large Object Heap. As pressure grows, the runtime needs more CPU time for collection and compaction decisions. Your request handlers are still running, but more of the process is now dedicated to cleaning up allocations created by previous requests.

The symptoms are easy to confuse with other problems. You might see high CPU during load tests, but the endpoint does not appear CPU-heavy. You might see memory climb and drop in waves. You might see latency spikes without a matching increase in database duration. You might see Kubernetes pods restarted because their memory limit is too tight for the allocation pattern. You might see the request queue increase even though average latency still looks acceptable. Average latency hides this problem. p99 exposes it. A service can appear healthy at 30 ms average latency while a meaningful number of users are waiting 800 ms because collections, scheduling and queueing are creating tail latency.

The GC is not the villain

The .NET garbage collector is generational. New objects start in Gen 0. Objects that survive a collection can move to Gen 1 and then Gen 2. This design works well because most request-related objects should die quickly. A typical ASP.NET Core request creates temporary state, uses it, returns a response, and most of that state becomes unreachable. The model breaks down when the amount of temporary state becomes excessive, or when supposedly temporary objects survive longer than expected. That can happen because they are captured by closures, held by async state machines, stored in logs, accumulated in lists, buffered into memory, or referenced by longer-lived objects. The runtime can collect dead objects. It cant guess that your code did not really need to allocate them in the first place.

The Large Object Heap deserves special attention. Objects around 85,000 bytes and above are treated as large objects by the runtime. In API code, this usually means arrays, buffers, large strings, large JSON payloads, big byte[] values, or memory-backed streams. Large objects are more expensive to move around, so they behave differently from small short-lived objects. If your service repeatedly creates large arrays or buffers under load, you can create a different kind of pressure from ordinary Gen 0 churn.

A clean endpoint can still allocate too much

This endpoint looks normal. Ive seen plenty of code like this in real systems.

app.MapGet("/orders/{customerId:int}", async (
    int customerId,
    OrdersDbContext db,
    ILogger<Program> logger,
    CancellationToken stopToken) =>
{
    var orders = await db.Orders
        .Where(order => order.CustomerId == customerId)
        .OrderByDescending(order => order.CreatedUtc)
        .Take(50)
        .ToListAsync(stopToken);

    logger.LogInformation($"Loaded {orders.Count} orders for customer {customerId}");

    var response = orders
        .Select(order => new OrderSummaryResponse(
            order.Id,
            order.Reference,
            $"{order.Currency} {order.Amount:N2}",
            order.CreatedUtc.ToString("O")))
        .ToArray();

    return Results.Ok(response);
});

There is nothing outrageous here. It uses async EF Core, limits the result set, maps to a response model and returns JSON. At normal traffic levels, this may be completely fine. Under heavy load, the allocation profile becomes more important.

The query materialises entities into a list. The log message uses string interpolation before the logging framework can decide whether the message should be written. The response mapping creates new objects. The formatted amount creates strings. The date formatting creates strings. ToArray() creates another allocation. JSON serialisation then walks the response and writes the output. Each piece is small enough to ignore alone. The combination becomes expensive when multiplied by tens of thousands of requests per second.

A more careful version avoids some of that cost without making the code unreadable.

app.MapGet("/orders/{customerId:int}", async (
    int customerId,
    OrdersDbContext db,
    ILogger<Program> logger,
    CancellationToken stopToken) =>
{
    var response = await db.Orders
        .AsNoTracking()
        .Where(order => order.CustomerId == customerId)
        .OrderByDescending(order => order.CreatedUtc)
        .Take(50)
        .Select(order => new OrderSummaryResponse(
            order.Id,
            order.Reference,
            order.Currency,
            order.Amount,
            order.CreatedUtc))
        .ToListAsync(stopToken);

    OrderLog.LoadedOrders(logger, response.Count, customerId);

    return Results.Ok(response);
});

public sealed record OrderSummaryResponse(
    long Id,
    string Reference,
    string Currency,
    decimal Amount,
    DateTimeOffset CreatedUtc);

public static partial class OrderLog
{
    [LoggerMessage(
        EventId = 1001,
        Level = LogLevel.Information,
        Message = "Loaded {OrderCount} orders for customer {CustomerId}")]
    public static partial void LoadedOrders(
        ILogger logger,
        int orderCount,
        int customerId);
}

This version projects directly from the database query into the response shape. It avoids entity tracking for a read-only path. It returns raw values rather than preformatted strings, which lets the serialiser do its normal job. It uses source-generated logging, so the message template is not parsed and value types are not boxed in the same way as the normal logging extension path. The endpoint is still ordinary C#. It has simply stopped creating some avoidable garbage.

Source-generated JSON helps more than people think

Serialisation is often blamed late because it feels like framework plumbing. In high-throughput APIs, JSON can become a meaningful part of CPU and allocation cost. Reflection-heavy serialisation paths, repeated options construction, large DTO graphs and unnecessary formatting all add up.

A common mistake is creating serialiser options inside request code.

app.MapGet("/status", () =>
{
    var options = new JsonSerializerOptions(JsonSerializerDefaults.Web)
    {
        WriteIndented = false
    };

    return Results.Json(new StatusResponse("ok", DateTimeOffset.UtcNow), options);
});

That is unnecessary work per request. In a hot path, options should be configured once. For known response types, source-generated JSON gives the runtime more information at compile time and reduces runtime discovery work.

var builder = WebApplication.CreateSlimBuilder(args);

builder.Services.ConfigureHttpJsonOptions(options =>
{
    options.SerializerOptions.TypeInfoResolverChain.Insert(
        0,
        ApiJsonContext.Default);
});

var app = builder.Build();

app.MapGet("/status", () => new StatusResponse("ok", DateTimeOffset.UtcNow));

app.Run();

public sealed record StatusResponse(string Status, DateTimeOffset ServerTimeUtc);

[JsonSerializable(typeof(StatusResponse))]
[JsonSerializable(typeof(OrderSummaryResponse))]
public partial class ApiJsonContext : JsonSerializerContext
{
}

This kind of change will not rescue bad architecture, but it can remove repeated work from an endpoint that is already hot. The best performance work usually looks boring. You remove one avoidable cost, test again, then remove the next one.

String handling is a quiet allocation machine

Strings are immutable. That is usually a good thing. It also means that parsing and formatting can create far more allocations than the code suggests.

Take a simple comma-separated header value.

app.MapGet("/search", (HttpRequest request) =>
{
    var raw = request.Headers["X-Tags"].ToString();
    var tags = raw.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries);

    return Results.Ok(new { Count = tags.Length });
});

Again, this is fine in normal code. On a hot path, Split creates an array and separate strings. If the endpoint only needs to validate or count values, that is more allocation than needed.

app.MapGet("/search", (HttpRequest request) =>
{
    ReadOnlySpan<char> raw = request.Headers["X-Tags"].ToString().AsSpan();
    var count = 0;

    while (!raw.IsEmpty)
    {
        var commaIndex = raw.IndexOf(',');
        var current = commaIndex < 0 ? raw : raw[..commaIndex];

        if (!current.Trim().IsEmpty)
        {
            count++;
        }

        if (commaIndex < 0)
        {
            break;
        }

        raw = raw[(commaIndex + 1)..];
    }

    return Results.Ok(new TagCountResponse(count));
});

public sealed record TagCountResponse(int Count);

I would not write every endpoint like this. Most APIs do not need span-based parsing in normal business code. The point is that allocation-free techniques exist when a path is truly hot. Use them where measurement shows a real benefit. Keep the rest of the code readable.

Large payloads need a different mindset

Small per-request allocations create churn. Large allocations create heavier pressure. The obvious examples are file uploads, image processing, exported reports, large JSON documents and APIs that buffer full request or response bodies in memory.

This is the kind of code that looks innocent during development and painful under load.

app.MapPost("/upload", async (
    IFormFile file,
    IFileStore fileStore,
    CancellationToken stopToken) =>
{
    using var memoryStream = new MemoryStream();
    await file.CopyToAsync(memoryStream, stopToken);

    var bytes = memoryStream.ToArray();
    await fileStore.SaveAsync(file.FileName, bytes, stopToken);

    return Results.Accepted();
});

This buffers the file into memory, then creates another array with ToArray(). A few small files may be fine. Many concurrent uploads will push the service hard. A better approach streams the body through the system and avoids keeping the entire file in managed memory.

app.MapPost("/upload", async (
    HttpRequest request,
    IFileStore fileStore,
    CancellationToken stopToken) =>
{
    if (!request.HasFormContentType)
    {
        return Results.BadRequest();
    }

    var form = await request.ReadFormAsync(stopToken);
    var file = form.Files.GetFile("file");

    if (file is null || file.Length == 0)
    {
        return Results.BadRequest();
    }

    await using var stream = file.OpenReadStream();
    await fileStore.SaveAsync(file.FileName, stream, stopToken);

    return Results.Accepted();
});

public interface IFileStore
{
    Task SaveAsync(string fileName, Stream content, CancellationToken stopToken);
}

For serious upload systems, you would go further. You would stream directly to object storage, calculate checksums as bytes pass through, apply size limits, avoid double buffering, scan asynchronously where appropriate, and keep the request path as small as the product allows. The central idea is simple. Large payloads should move through the service, rather than live inside it.

ArrayPool is useful, but it is easy to misuse

ArrayPool<T> is one of the first tools people reach for when they learn about allocation pressure. It can help when code repeatedly creates temporary arrays. It also introduces lifetime responsibility. Once you rent a buffer, you must return it. Once returned, you must never read from it again. If the buffer may contain sensitive data, clear it before returning it.

public static async Task<byte[]> ReadSmallPrefixAsync(
    Stream stream,
    int length,
    CancellationToken stopToken)
{
    var rented = ArrayPool<byte>.Shared.Rent(length);

    try
    {
        var read = await stream.ReadAsync(rented.AsMemory(0, length), stopToken);
        return rented.AsSpan(0, read).ToArray();
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(rented, clearArray: true);
    }
}

That example still returns a new array because the caller needs ownership of the data after the method returns. Pooling helped with the temporary read buffer, but the final result has to be safe. Returning rented arrays from APIs is usually a bad idea unless the ownership model is extremely clear.

A more natural use is internal processing where the buffer never escapes the method.

public static async Task<long> CountBytesAsync(
    Stream stream,
    CancellationToken stopToken)
{
    var buffer = ArrayPool<byte>.Shared.Rent(64 * 1024);

    try
    {
        long total = 0;

        while (true)
        {
            var read = await stream.ReadAsync(buffer.AsMemory(0, buffer.Length), stopToken);

            if (read == 0)
            {
                return total;
            }

            total += read;
        }
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(buffer);
    }
}

This is the right shape. The buffer is rented, used and returned inside a clear boundary. No caller can accidentally hold it after it has gone back to the pool.

Object pooling has a cost

Object pooling sounds like an automatic win. Its not though. A pool keeps objects around, which means lower allocation churn can come with higher retained memory. A pool also adds complexity because every object needs a clean reset boundary. If a pooled object carries state from one request into another, you now have a correctness bug rather than a performance issue. Use pooling for objects that are expensive to allocate or initialise, used frequently, and easy to reset. StringBuilder is a classic example because it owns an internal buffer. For ordinary small objects, pooling can be slower and messier than letting the GC handle them.

public sealed class PooledStringBuilderPolicy : PooledObjectPolicy<StringBuilder>
{
    private const int MaximumRetainedCapacity = 4096;

    public override StringBuilder Create() => new(capacity: 256);

    public override bool Return(StringBuilder builder)
    {
        if (builder.Capacity > MaximumRetainedCapacity)
        {
            return false;
        }

        builder.Clear();
        return true;
    }
}

builder.Services.AddSingleton<ObjectPool<StringBuilder>>(serviceProvider =>
{
    var provider = new DefaultObjectPoolProvider();
    return provider.Create(new PooledStringBuilderPolicy());
});

public sealed class ReferenceFormatter(ObjectPool<StringBuilder> pool)
{
    public string Format(string prefix, long id)
    {
        var builder = pool.Get();

        try
        {
            builder.Append(prefix);
            builder.Append('-');
            builder.Append(id);
            return builder.ToString();
        }
        finally
        {
            pool.Return(builder);
        }
    }
}

Notice the capacity guard. Without it, one unusually large request can leave a massive internal buffer in the pool. That can make memory usage look strange long after the request has finished.

Logging can allocate even when you think it is disabled

Logging is essential. High-volume logging in a hot path can still hurt you. Expensive arguments may be evaluated before the logging provider decides whether to write anything. String interpolation creates the string immediately. Serialising a full object for a debug log can allocate heavily even when debug logging is disabled.

logger.LogDebug($"Processing payment {payment.Id} with payload {JsonSerializer.Serialize(payment)}");

That line performs work before the logger gets a chance to filter it. A safer shape is to guard expensive logging or use source-generated logging for common messages.

if (logger.IsEnabled(LogLevel.Debug))
{
    logger.LogDebug("Processing payment {PaymentId} with payload {Payload}",
        payment.Id,
        JsonSerializer.Serialize(payment));
}

For hot messages, use source-generated logging.

public static partial class PaymentLog
{
    [LoggerMessage(
        EventId = 2001,
        Level = LogLevel.Debug,
        Message = "Processing payment {PaymentId}")]
    public static partial void ProcessingPayment(
        ILogger logger,
        Guid paymentId);
}

PaymentLog.ProcessingPayment(logger, payment.Id);

This keeps structured logging while reducing runtime overhead. It also forces you to define the messages you actually care about rather than spraying string templates through every endpoint.

Exceptions are especially expensive as control flow

Exceptions allocate. Stack traces cost. Throwing exceptions as part of normal request flow is a reliable way to create avoidable pressure.

This style is common in service code.

public async Task<Customer> GetCustomerAsync(
    int customerId,
    CancellationToken stopToken)
{
    var customer = await _db.Customers.FindAsync([customerId], stopToken);

    if (customer is null)
    {
        throw new CustomerNotFoundException(customerId);
    }

    return customer;
}

That may be fine when missing customers are exceptional. Its a bad fit when the endpoint commonly receives unknown IDs. Use result shapes for expected outcomes.

public async Task<Customer?> GetCustomerAsync(
    int customerId,
    CancellationToken stopToken)
{
    return await _db.Customers.FindAsync([customerId], stopToken);
}

app.MapGet("/customers/{customerId:int}", async (
    int customerId,
    CustomerService customers,
    CancellationToken stopToken) =>
{
    var customer = await customers.GetCustomerAsync(customerId, stopToken);

    return customer is null
        ? Results.NotFound()
        : Results.Ok(customer);
});

Reserve exceptions for genuinely exceptional paths, the clue's in the name!

Async state also has a memory profile

Async is still the right model for I/O-heavy ASP.NET Core applications. Blocking threads under load is usually worse. But async code is not magic. State machines, captured variables, closures and continuations can all contribute to allocation pressure.

A small example is a lambda that captures request state unnecessarily.

app.MapGet("/customers/{customerId:int}/score", async (
    int customerId,
    IScoreService scores,
    CancellationToken stopToken) =>
{
    async Task<CustomerScoreResponse> LoadScoreAsync()
    {
        var score = await scores.GetScoreAsync(customerId, stopToken);
        return new CustomerScoreResponse(customerId, score);
    }

    return Results.Ok(await LoadScoreAsync());
});

That local async function is not needed. The simpler version is easier for people and the runtime.

app.MapGet("/customers/{customerId:int}/score", async (
    int customerId,
    IScoreService scores,
    CancellationToken stopToken) =>
{
    var score = await scores.GetScoreAsync(customerId, stopToken);
    return Results.Ok(new CustomerScoreResponse(customerId, score));
});

This isnt about micro-optimising every line of C#. Its about avoiding patterns that quietly multiply under load.

Infrastructure can make GC problems worse

Allocation pressure is code-level behaviour, but the infrastructure decides how much room the runtime has to absorb it. A .NET API running in a large VM with generous memory may hide allocation churn for a long time. The same API in a tightly limited Kubernetes pod can start struggling much earlier. Containers make memory limits explicit. The GC sees those limits and adjusts its behaviour around them. That is good, but it also means the memory limit is part of the performance design. A pod with a 512 MB memory limit running a high-throughput API has much less headroom for request buffers, JSON serialisation, socket buffers, native memory, JIT memory, thread stacks and the managed heap. When you size the pod too tightly, the app may spend more time collecting and less time serving.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
spec:
  replicas: 6
  template:
    spec:
      containers:
        - name: orders-api
          image: example.azurecr.io/orders-api:1.0.0
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "1Gi"
          env:
            - name: DOTNET_GCHeapHardLimitPercent
              value: "70"

You should not set GC knobs blindly. The default runtime behaviour is usually a strong starting point. The point is to treat memory limits as a performance control, not only a cost control. If every pod is close to its memory limit during normal traffic, scale-out may add replicas while every replica still spends too much time fighting the same allocation profile.

CPU limits matter too. Server GC is designed for throughput on server workloads. If a container is heavily CPU-throttled, the runtime has less room to collect efficiently while also serving requests. You can end up with a strange feedback loop where CPU throttling increases latency, longer requests keep objects alive for longer, more objects survive into older generations, and GC pressure gets worse.

That loop is one reason memory problems often appear as latency problems first.

How to measure it properly

Start with counters before taking dumps and traces. Counters let you watch the service under load and see whether GC pressure lines up with latency.

dotnet-counters ps

dotnet-counters monitor \
  --process-id <pid> \
  System.Runtime \
  Microsoft.AspNetCore.Hosting \
  Microsoft-AspNetCore-Server-Kestrel

The counters I would watch first are allocation rate, GC heap size, Gen 0 count, Gen 1 count, Gen 2 count, LOH size, GC fragmentation, total pause time by GC, request rate, current requests, request queue length and thread pool queue length. You are looking for correlation. If allocation rate jumps with request rate and p99 latency jumps shortly after, you have a serious clue. If Gen 2 count rises during the load test and latency spikes with it, keep digging.

For deeper investigation, collect traces and dumps.

dotnet-trace collect \
  --process-id <pid> \
  --providers Microsoft-Windows-DotNETRuntime:0x1C000080018:5

dotnet-gcdump collect \
  --process-id <pid> \
  --output gc-dump.gcdump

Counters tell you that memory pressure exists. Traces and dumps help show where it comes from. At that point you can look for hot allocation sites, unexpectedly retained objects, large arrays, excessive strings, high exception counts, heavy serialisation paths and objects surviving longer than the request.

Benchmark the real endpoint, then make it worse on purpose

The fastest way to understand the GC wall is to build a small benchmark and deliberately add allocations. Start with a minimal endpoint, then add string formatting, JSON payloads, logging, LINQ, exceptions and large buffers. Watch allocation rate and latency as each cost is added.

var builder = WebApplication.CreateSlimBuilder(args);
var app = builder.Build();

app.MapGet("/baseline", () => Results.Ok(new BaselineResponse("ok")));

app.MapGet("/allocating", () =>
{
    var values = Enumerable.Range(1, 100)
        .Select(number => $"value-{number}")
        .ToArray();

    return Results.Ok(new AllocatingResponse(values));
});

app.Run();

public sealed record BaselineResponse(string Status);
public sealed record AllocatingResponse(string[] Values);

Run a short test against each endpoint.

wrk -t8 -c512 -d60s http://localhost:5000/baseline

wrk -t8 -c512 -d60s http://localhost:5000/allocating

Keep dotnet-counters running beside the test. The exact numbers will depend on hardware, OS, .NET version and payload shape. The pattern matters more than the absolute result. When two endpoints do similar business work but one allocates far more per request, the difference becomes visible as traffic increases.

What to change first

Dont start by replacing ordinary code with unsafe tricks. Start with the small wins. Project directly into response models rather than loading large entities and reshaping them afterwards. Use AsNoTracking() on read-only EF Core queries. Avoid creating serialiser options per request. Avoid formatting values into strings when the client can receive typed values. Remove accidental ToList(), ToArray() and string.Join() calls from hot paths. Avoid exceptions for expected outcomes. Stop logging full payloads during normal operation. Stream large bodies. Use source-generated JSON and source-generated logging where the endpoint is hot enough to justify it.

Then measure again.

If allocation rate is still high, look at spans, memory pooling, array pooling and custom parsing. These tools are powerful, but they make code harder to reason about. They belong in carefully chosen places, with tests around ownership and lifetime. A senior engineer does not make the whole codebase ugly for a theoretical win. They make the hot path simple, measured and predictable.

The better mental model

At high throughput, every request leaves a memory footprint. Some of that footprint is useful. Some of it is accidental. The useful part is the data you genuinely need to process and return. The accidental part is everything created because the code took the easiest route: extra lists, intermediate arrays, repeated formatting, unnecessary strings, buffered streams, avoidable closures, broad object graphs and logging work nobody reads. The GC is incredibly good at cleaning up normal managed memory. It is still paid work. When your API is quiet, the bill is tiny. When your API is under heavy load, the bill can become one of the largest costs in the process.

The real skill is knowing when to care. Most endpoints should stay simple. Some endpoints become important enough that allocation rate deserves the same attention as database duration, CPU usage and response latency. Once an endpoint sits on the critical path for a high-traffic system, memory becomes architecture.

The GC wall is rarely caused by one terrible line of code. It is usually caused by hundreds of reasonable allocations multiplied by serious traffic. Thats why it catches people out. The code looks fine, the framework is doing its job, and the database is still alive. Then p99 latency starts to drift and nobody can explain why.

When that happens, stop guessing. Measure allocation rate. Watch Gen 2 collections. Check LOH size. Look at pause time. Compare the clean endpoint with the real one. Then remove the allocations that dont need to exist. You dont need to write C# like a systems programmer everywhere. But when a .NET API is pushed to extremes, the runtime details become part of the design. The Engineers that understand that usually fix performance problems faster than the teams still staring at average response time and wondering why production feels slow.

ASP.NET Core memory management and garbage collection:

.NET garbage collection fundamentals: tion fundamentals:

Garbage collection and performance:

Large Object Heap on Windows:

.NET garbage collector configuration settings: nfiguration settings:

dotnet-counters diagnostic tool:

Well-known .NET EventCounters:

Memory-related and span types: