The hidden .NET memory leak

Memory leaks in .NET do not always look like memory leaks. Most developers picture the obvious version. Something gets added to a static list. Nothing ever removes it. Memory grows forever. The process eventually falls over. That version exists, but it is not the one that usually gets missed in production. The more awkward version is when every object has a reason to still be alive. Nothing is technically lost. Nothing looks obviously broken in code review. The garbage collector is doing what it should. The problem is that your application keeps giving objects longer lifetimes than they were meant to have. That is where a lot of .NET memory issues hide.

The GC cannot collect objects you are still holding

The .NET garbage collector is good, but it is not magic. It can reclaim objects that are no longer reachable. If your code still has a path to an object, the GC has to treat it as live. That path can be direct, like a static dictionary. It can also be indirect, through an event handler, a closure, a long lived service, a queue, a timer, a cache, or an object graph hanging off a singleton. This is why some leaks do not look like leaks.

The memory is not unmanaged. The objects are not lost. The process is simply retaining too much application state. You see it in production as steady memory growth, more Gen2 collections, longer pauses, growing container memory, and eventually restarts. The first instinct is often to blame the GC. In many cases, the GC is only reporting the shape of your object lifetimes back to you.

Static caches are the classic trap

A static cache is easy to justify. You have expensive reference data. You do not want to load it repeatedly. You put it somewhere global. It works. Then the cache starts accepting dynamic data. Customer specific data. User specific data. Tenant specific data. Request shaped data. Data with no expiry. Data where the key space grows over time. The cache was added for performance, but it becomes a memory retention mechanism.

public static class CustomerCache
{
    private static readonly Dictionary<Guid, CustomerSnapshot> Customers = new();

    public static CustomerSnapshot GetOrAdd(Guid customerId, Func<CustomerSnapshot> factory)
    {
        if (Customers.TryGetValue(customerId, out var customer))
        {
            return customer;
        }

        customer = factory();
        Customers[customerId] = customer;

        return customer;
    }
}

This code is simple, but it has no limit. Every customer added to the dictionary can stay there for the lifetime of the process. If CustomerSnapshot contains orders, permissions, addresses, preferences, or other nested objects, the retained memory can be much larger than the dictionary suggests. The safer version is not just "use a cache library". The safer version is to decide what the cache is allowed to hold, how long it is allowed to hold it, and what happens when the system is under pressure.

IMemoryCache gives you expiry and size controls, but only if you actually use them.

public sealed class CustomerSnapshotCache(IMemoryCache cache)
{
    public async Task<CustomerSnapshot> GetOrCreate(
        Guid customerId,
        Func<CancellationToken, Task<CustomerSnapshot>> factory,
        CancellationToken stopToken)
    {
        var cacheKey = $"customer-snapshot:{customerId}";
        var snapshot = await cache.GetOrCreateAsync(cacheKey, async entry =>
        {
            entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10);
            entry.SlidingExpiration = TimeSpan.FromMinutes(2);
            entry.Size = 1;

            return await factory(stopToken);
        });

        return snapshot ?? throw new InvalidOperationException("Customer snapshot could not be loaded.");
    }
}

This still needs a configured size limit on the cache. Without that, Size = 1 does not protect anything. The point is not that this specific code solves every case. The point is that memory needs an exit plan. A cache without expiry, bounds, or ownership rules is just a long lived collection with a nicer name.

Event handlers can keep entire object graphs alive

Event subscriptions are one of the easiest leaks to miss. An object subscribes to an event on a longer lived object. The longer lived object now holds a reference to the subscriber through the delegate. If the subscriber is never unsubscribed, it stays alive. This is especially common in desktop apps, background services, hosted components, domain event dispatchers, and custom in process pub/sub patterns.

public sealed class ReportSession
{
    private readonly ReportProgressNotifier notifier;

    public ReportSession(ReportProgressNotifier notifier)
    {
        this.notifier = notifier;
        this.notifier.ProgressChanged += OnProgressChanged;
    }

    private void OnProgressChanged(object? sender, ProgressChangedEventArgs args)
    {
        // Update session state
    }
}

If ReportProgressNotifier is a singleton and ReportSession is created many times, every session can stay alive through the event subscription. The leak is not visible from the session alone. The session does not store itself anywhere. The reference is held by the publisher.

A better version makes the lifetime explicit.

public sealed class ReportSession : IDisposable
{
    private readonly ReportProgressNotifier notifier;
    public ReportSession(ReportProgressNotifier notifier)
    {
        this.notifier = notifier;
        this.notifier.ProgressChanged += OnProgressChanged;
    }

    private void OnProgressChanged(object? sender, ProgressChangedEventArgs args)
    {
        // Update session state
    }

    public void Dispose()
    {
        notifier.ProgressChanged -= OnProgressChanged;
    }
}

This is boring code, but boring code is often what prevents production memory growth. If you subscribe to something longer lived than you, you need a matching unsubscribe path. If the subscription is hidden behind a helper, the helper needs the same discipline.

Closures can retain more than you think

Closures are useful, but they can quietly keep objects alive. A lambda captures a variable. That variable becomes part of a generated closure object. If the lambda is stored in a long lived place, everything it captured can become long lived too.

The mistake is usually not capturing a string or an integer. The mistake is capturing something large without noticing.

public sealed class ExportScheduler
{
    private readonly List<Func<CancellationToken, Task>> jobs = new();

    public void Schedule(ExportRequest request)
    {
        jobs.Add(async stopToken =>
        {
            await ProcessExport(request, stopToken);
        });
    }

    private static Task ProcessExport(ExportRequest request, CancellationToken stopToken)
    {
        return Task.CompletedTask;
    }
}

If ExportRequest contains uploaded data, parsed documents, user context, or a large object graph, the scheduled delegate retains all of it. The list only shows delegates. The retained memory sits behind the capture.

A cleaner approach is to capture only the data needed later.

public sealed class ExportScheduler
{
    private readonly List<Func<CancellationToken, Task>> jobs = new();

    public void Schedule(ExportRequest request)
    {
        var exportId = request.ExportId;

        jobs.Add(stopToken => ProcessExport(exportId, stopToken));
    }

    private static Task ProcessExport(Guid exportId, CancellationToken stopToken)
    {
        return Task.CompletedTask;
    }
}

This changes the lifetime of the data. The scheduled job keeps the identifier, not the entire request. That distinction is important in high throughput systems. Capturing a request object feels harmless when the object is small. Later the request grows, someone adds metadata, parsed content, or validation results, and the memory profile changes without the scheduling code changing at all.

Background queues can retain work forever

Queues are useful because they decouple work. They are also dangerous because queued work is retained work. An in memory queue does not just store jobs. It stores whatever the job object references. If producers are faster than consumers, memory grows. If the queue is unbounded, the process becomes the buffer for the whole system.

public sealed class EmailQueue
{
    private readonly Channel<EmailWorkItem> channel = Channel.CreateUnbounded<EmailWorkItem>();

    public ValueTask Enqueue(EmailWorkItem item, CancellationToken stopToken)
    {
        return channel.Writer.WriteAsync(item, stopToken);
    }

    public IAsyncEnumerable<EmailWorkItem> ReadAll(CancellationToken stopToken)
    {
        return channel.Reader.ReadAllAsync(stopToken);
    }
}

Unbounded channels can be fine for small internal coordination. They are risky when work arrives from users, APIs, message brokers, timers, or external systems. If EmailWorkItem contains attachments, parsed body content, HTML, headers, and extracted metadata, each queued item can be large. The queue depth becomes a memory graph.

A bounded channel forces the system to make a decision when it cannot keep up.

public sealed class EmailQueue
{
    private readonly Channel<EmailWorkItem> channel = Channel.CreateBounded<EmailWorkItem>(
        new BoundedChannelOptions(capacity: 500)
        {
            FullMode = BoundedChannelFullMode.Wait,
            SingleReader = false,
            SingleWriter = false
        });

    public ValueTask Enqueue(EmailWorkItem item, CancellationToken stopToken)
    {
        return channel.Writer.WriteAsync(item, stopToken);
    }

    public IAsyncEnumerable<EmailWorkItem> ReadAll(CancellationToken stopToken)
    {
        return channel.Reader.ReadAllAsync(stopToken);
    }
}

This does not remove the need for proper queueing infrastructure. It simply prevents the process from pretending it has infinite memory. For serious background work, I would rather keep large payloads out of memory altogether. Store the blob, queue a reference, and let the worker load the data when it is ready to process it.

Singleton services can accidentally own request data

Dependency injection makes lifetimes look clean, but it also makes lifetime mistakes easy to hide. A singleton service lives for the lifetime of the application. If it stores request specific data, that data can live for the lifetime of the application too.

public sealed class CurrentUserStore
{
    private readonly Dictionary<string, UserContext> users = new();

    public void Set(string correlationId, UserContext user)
    {
        users[correlationId] = user;
    }

    public UserContext? Get(string correlationId)
    {
        return users.GetValueOrDefault(correlationId);
    }
}

If this service is registered as a singleton, every entry can remain until the process exits unless something removes it. The class name sounds harmless. The lifetime is the problem. The same issue shows up when singleton services capture scoped services, HttpContext, request bodies, claims principals, or per request options. A request lifetime should stay inside the request. If data needs to outlive the request, store the smallest durable representation you need. That usually means an identifier, a status row, or a small immutable record, not an entire request context.

Timers can hold services alive

Timers are another common source of hidden retention. A timer holds a callback. The callback often captures this. If the timer is not disposed, the object can stay alive. If the callback creates scopes or starts async work incorrectly, the retained graph can grow again.

public sealed class RefreshingLookupClient
{
    private readonly Timer timer;

    public RefreshingLookupClient()
    {
        timer = new Timer(_ => Refresh(), null, TimeSpan.Zero, TimeSpan.FromMinutes(5));
    }

    private void Refresh()
    {
        // Refresh lookup data
    }
}

This example has several problems. The timer needs disposal. The callback is synchronous. Exceptions can cause trouble. If Refresh overlaps with itself, work can pile up. In modern .NET, PeriodicTimer inside a hosted service is usually easier to reason about.

public sealed class LookupRefreshWorker(
    IServiceScopeFactory scopeFactory,
    ILogger<LookupRefreshWorker> logger) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stopToken)
    {
        using var timer = new PeriodicTimer(TimeSpan.FromMinutes(5));

        while (await timer.WaitForNextTickAsync(stopToken))
        {
            try
            {
                using var scope = scopeFactory.CreateScope();

                var refreshService = scope.ServiceProvider.GetRequiredService<ILookupRefreshService>();

                await refreshService.Refresh(stopToken);
            }
            catch (OperationCanceledException) when (stopToken.IsCancellationRequested)
            {
                break;
            }
            catch (Exception ex)
            {
                logger.LogError(ex, "Lookup refresh failed.");
            }
        }
    }
}

This makes the lifetime clearer. The timer belongs to the worker. The scoped service belongs to one iteration. Cancellation is respected. The code still needs thought around overlapping work, but the ownership is far less vague.

Large objects make retention more painful

Not all retained objects hurt equally. A few small objects retained for too long may never become a production issue. Large arrays, strings, byte buffers, parsed documents, images, and serialised payloads are different. Large objects can end up on the Large Object Heap. They are more expensive to move and compact. If your app retains large buffers through queues, caches, closures, logs, or long lived services, memory pressure can climb quickly.

This happens often in document pipelines, email ingestion, file uploads, image handling, and AI processing flows. The code can look innocent because the type is just byte[], string, or MemoryStream.

public sealed record DocumentWorkItem(
    Guid DocumentId,
    string FileName,
    byte[] FileData,
    string ExtractedText);

A few of these are fine. Thousands waiting in memory are not. For larger workloads, the work item should usually carry a reference to stored data rather than the data itself.

public sealed record DocumentWorkItem(
    Guid DocumentId,
    string FileName,
    Uri BlobUri);

That small modelling decision changes the behaviour of the whole pipeline. The queue now retains metadata instead of retaining the full file and extracted text.

Memory leaks often show up as lifetime bugs

The hardest part about these problems is that the code often has no single dramatic flaw. The cache was added for speed. The event was added for decoupling. The closure was added for convenience. The queue was added for resilience. The singleton was added because the service looked stateless. The timer was added because something needed to run every few minutes. The issue is lifetime. Something short lived gets attached to something long lived. Something large gets stored where only something small was needed. Something unbounded gets fed by production traffic. Something that should expire never does.

That is the pattern to look for. When memory grows in a .NET process, I would not start by asking why the GC is failing. I would ask what the application is still holding, who is holding it, and whether that owner should have such a long lifetime.

What I would measure first

In production, I would look at memory growth over time, Gen2 collection frequency, Large Object Heap size, allocation rate, thread count, queue depth, cache entry counts, and container memory limits. The important part is the relationship between those numbers. If allocation rate is high but memory returns to baseline, you may have an allocation problem rather than a retention problem. If memory keeps climbing after Gen2 collections, something is staying alive. If queue depth and memory rise together, queued work is probably part of the story. If LOH size climbs during document processing, large payloads are likely being retained for too long.

Tools like dotnet-counters, dotnet-gcdump, dotnet-dump, Visual Studio, JetBrains dotMemory, and PerfView can help. The tool matters less than the question you ask with it. You are looking for roots. What is keeping the object alive? That answer tells you whether you have a GC issue or an ownership issue. Most of the time, it is ownership.

The practical fix

The practical fix is not to avoid caches, queues, events, closures, timers, or singleton services. You need those patterns. The fix is to make lifetime a design decision. Caches need expiry and bounds. Queues need capacity and backpressure. Event subscriptions need unsubscribe paths. Closures should capture the smallest useful data. Singleton services should avoid request state. Timers need clear disposal and cancellation. Large payloads should be stored outside memory when they do not need to be processed immediately.

None of this is glamorous. But its the difference between a service that uses memory and a service that slowly collects its own history. Thats the real trap with .NET memory leaks. The object is often still reachable. The code is often doing what it was told to do. The leak is the lifetime you accidentally designed.

The hidden .NET memory leak

Comments

More from this blog

Field Backed Properties in .NET 10

Post-Quantum Cryptography in .NET 10

When Microservices Cost More Than They Deliver

Fast Cross-Module Queries in .NET Modular Monoliths

Securing AI Memory Against Poisoning in .NET

The GC cannot collect objects you are still holding

Static caches are the classic trap

Event handlers can keep entire object graphs alive

Closures can retain more than you think

Background queues can retain work forever

Singleton services can accidentally own request data

Timers can hold services alive

Large objects make retention more painful

Memory leaks often show up as lifetime bugs

What I would measure first

The practical fix

Command Palette

Comments

More from this blog

The GC cannot collect objects you are still holding

Static caches are the classic trap

Event handlers can keep entire object graphs alive

Closures can retain more than you think

Background queues can retain work forever

Singleton services can accidentally own request data

Timers can hold services alive

Large objects make retention more painful

Memory leaks often show up as lifetime bugs

What I would measure first

The practical fix