Skip to main content

Command Palette

Search for a command to run...

Build a Chaos Proxy in C#

Updated
12 min readView as Markdown
Build a Chaos Proxy in C#
P
Senior Software Engineer specialising in cloud architecture, distributed systems, and modern .NET development, with over two decades of experience designing and delivering enterprise platforms in financial, insurance, and high-scale commercial environments. My focus is on building systems that are reliable, scalable, and maintainable over the long term. I’ve led modernisation initiatives moving legacy platforms to cloud-native Azure architectures, designed high-throughput streaming solutions to eliminate performance bottlenecks, and implemented secure microservices environments using container-based deployment models and event-driven integration patterns. From an architecture perspective, I have strong practical experience applying approaches such as Vertical Slice Architecture, Domain-Driven Design, Clean Architecture, and Hexagonal Architecture. I’m particularly interested in modular system design that balances delivery speed with long-term sustainability, and I enjoy solving complex problems involving distributed workflows, performance optimisation, and system reliability. I enjoy mentoring engineers, contributing to architectural decisions, and helping teams simplify complex systems into clear, maintainable designs. I’m always open to connecting with other engineers, architects, and technology leaders working on modern cloud and distributed system challenges.

Most API clients are tested against the best possible version of the downstream system. The downstream responds quickly. The JSON is valid. The status code makes sense. The connection stays open. The retry policy is never really challenged. Then the code goes near production and suddenly the client has to deal with the internet being the internet.

A local chaos proxy is a small tool that sits between your application and a real downstream API. Your app still makes HTTP calls. The downstream still exists. The difference is that the proxy gets a chance to make the call awkward before your client sees the response. It can add latency. It can return a random 503. It can cut the connection. It can return broken JSON. It can force your client code to prove that its timeouts, retries, logging, and cancellation behaviour are not just decorative.

That's what we're going to build in C#.

The idea

The proxy runs locally as an ASP.NET Core app. Your application would normally call this:

https://api.example.com/customers/123

During local testing, it calls this instead:

http://localhost:5050/customers/123

The proxy receives the request, applies a matching chaos rule, forwards the request to the real API, then streams the response back. For example, a request to /payments might have a 20% chance of returning a failure. A request to /customers might be delayed by up to two seconds. A request to /quotes might sometimes return invalid JSON. This gives you something more useful than a mock. A mock usually returns the exact behaviour you remembered to configure. A proxy lets the real API behave badly in ways your client still has to survive.

Create the project

Start with a plain ASP.NET Core app.

dotnet new web -n ChaosProxy
cd ChaosProxy

Add a simple configuration section to appsettings.json:

{
  "ChaosProxy": {
    "TargetBaseUrl": "https://api.example.com",
    "Rules": [
      {
        "PathPrefix": "/customers",
        "FailureRate": 0.10,
        "MalformedJsonRate": 0.05,
        "Latency": {
          "MinMs": 200,
          "MaxMs": 2000
        },
        "StatusCodes": [500, 502, 503]
      },
      {
        "PathPrefix": "/payments",
        "FailureRate": 0.25,
        "DropConnectionRate": 0.05,
        "StatusCodes": [429, 500, 503]
      }
    ]
  }
}

The rates are simple probabilities between 0 and 1. A value of 0.25 means roughly one in four matching requests will get that behaviour. Keep the first version simple. Path prefix matching is enough. You don't need a full rules engine to learn whether your client behaves properly when a downstream call starts lying to it.

Model the rules

Create a few small option types:

public sealed class ChaosProxyOptions
{
    public required string TargetBaseUrl { get; init; }
    public List<ChaosRule> Rules { get; init; } = [];
}

public sealed class ChaosRule
{
    public required string PathPrefix { get; init; }
    public double FailureRate { get; init; }
    public double MalformedJsonRate { get; init; }
    public double DropConnectionRate { get; init; }
    public List<int> StatusCodes { get; init; } = [500];
    public LatencyRange? Latency { get; init; }
}

public sealed class LatencyRange
{
    public int MinMs { get; init; }
    public int MaxMs { get; init; }
}

This is enough to describe the failures we care about for now. There’s a deliberate lack of cleverness here. The proxy should stay small, a local tool that makes your API client deal with failure earlier.

Add the forwarding endpoint

A proxy needs to accept any path and forward it to the configured downstream API. In Program.cs, configure the options and an HttpClient:

using Microsoft.Extensions.Options;

var builder = WebApplication.CreateBuilder(args);

builder.Services.Configure<ChaosProxyOptions>(
    builder.Configuration.GetSection("ChaosProxy"));

builder.Services.AddHttpClient("proxy", client =>
{
    client.Timeout = Timeout.InfiniteTimeSpan;
});

var app = builder.Build();

app.Map("/{**path}", async (
    HttpContext context,
    IHttpClientFactory httpClientFactory,
    IOptions<ChaosProxyOptions> options) =>
{
    var proxyOptions = options.Value;
    var rule = ChaosRules.Match(context.Request.Path, proxyOptions.Rules);

    if (rule is not null)
    {
        await ChaosEngine.ApplyLatencyAsync(rule, context.RequestAborted);

        if (ChaosEngine.ShouldDropConnection(rule))
        {
            context.Abort();
            return;
        }

        if (ChaosEngine.ShouldFail(rule))
        {
            context.Response.StatusCode = ChaosEngine.PickStatusCode(rule);
            await context.Response.WriteAsync(
                "Injected failure from local chaos proxy",
                context.RequestAborted);
            return;
        }
    }

    var targetUri = TargetUriBuilder.Build(
        proxyOptions.TargetBaseUrl,
        context.Request.Path,
        context.Request.QueryString);

    var client = httpClientFactory.CreateClient("proxy");

    using var proxyRequest = ProxyRequestBuilder.Create(
        context.Request,
        targetUri);

    using var proxyResponse = await client.SendAsync(
        proxyRequest,
        HttpCompletionOption.ResponseHeadersRead,
        context.RequestAborted);

    context.Response.StatusCode = (int)proxyResponse.StatusCode;

    ProxyHeaders.CopyResponseHeaders(proxyResponse, context.Response);

    if (rule is not null && ChaosEngine.ShouldReturnMalformedJson(rule))
    {
        context.Response.ContentType = "application/json";
        await context.Response.WriteAsync("{ \"customerId\": ", context.RequestAborted);
        return;
    }

    await proxyResponse.Content.CopyToAsync(
        context.Response.Body,
        context.RequestAborted);
});

app.Run();

There are three important details in that endpoint. First, it applies chaos before forwarding. That lets us fail quickly, delay the request, or drop the connection before the downstream receives anything. Second, it uses ResponseHeadersRead. That avoids buffering the whole response in memory before we start sending it back to the caller. Third, it passes context.RequestAborted down into the outbound call. If the original caller gives up, the proxy should stop working on its behalf.

A local tool that ignores cancellation can hide the same problem you’re trying to expose.

Build the target URI

The target URI builder is intentionally small:

public static class TargetUriBuilder
{
    public static Uri Build(
        string targetBaseUrl,
        PathString path,
        QueryString queryString)
    {
        var baseUri = new Uri(targetBaseUrl.TrimEnd('/') + "/");
        var relativePath = path.Value?.TrimStart('/') ?? string.Empty;
        var relativeUri = relativePath + queryString.Value;

        return new Uri(baseUri, relativeUri);
    }
}

If the proxy receives this:

http://localhost:5050/customers/123?includeOrders=true

It forwards this:

https://api.example.com/customers/123?includeOrders=true

The app under test only needs one change, point its downstream base URL at the proxy instead of the real service.

Copy the request

The request builder copies the HTTP method, URI, body, and headers from the incoming request into a new HttpRequestMessage.

public static class ProxyRequestBuilder
{
    private static readonly HashSet<string> HopByHopHeaders = new(
        StringComparer.OrdinalIgnoreCase)
    {
        "Connection",
        "Keep-Alive",
        "Proxy-Authenticate",
        "Proxy-Authorization",
        "TE",
        "Trailer",
        "Transfer-Encoding",
        "Upgrade",
        "Host"
    };

    public static HttpRequestMessage Create(HttpRequest request, Uri targetUri)
    {
        var proxyRequest = new HttpRequestMessage
        {
            Method = new HttpMethod(request.Method),
            RequestUri = targetUri
        };

        if (request.ContentLength > 0 || request.Headers.ContainsKey("Transfer-Encoding"))
        {
            proxyRequest.Content = new StreamContent(request.Body);
        }

        foreach (var header in request.Headers)
        {
            if (HopByHopHeaders.Contains(header.Key))
            {
                continue;
            }

            if (!proxyRequest.Headers.TryAddWithoutValidation(
                    header.Key,
                    header.Value.ToArray()))
            {
                proxyRequest.Content?.Headers.TryAddWithoutValidation(
                    header.Key,
                    header.Value.ToArray());
            }
        }

        return proxyRequest;
    }
}

The hop-by-hop headers are skipped because they describe one HTTP connection, not the next HTTP request the proxy is creating. This is one of those details that looks fussy until it breaks something. A proxy sits in the middle of two separate connections. It should not blindly copy every connection level instruction from one side to the other.

Copy the response headers

The response path needs similar care:

public static class ProxyHeaders
{
    private static readonly HashSet<string> SkippedResponseHeaders = new(
        StringComparer.OrdinalIgnoreCase)
    {
        "Transfer-Encoding"
    };

    public static void CopyResponseHeaders(
        HttpResponseMessage proxyResponse,
        HttpResponse response)
    {
        foreach (var header in proxyResponse.Headers)
        {
            if (!SkippedResponseHeaders.Contains(header.Key))
            {
                response.Headers[header.Key] = header.Value.ToArray();
            }
        }

        foreach (var header in proxyResponse.Content.Headers)
        {
            if (!SkippedResponseHeaders.Contains(header.Key))
            {
                response.Headers[header.Key] = header.Value.ToArray();
            }
        }
    }
}

This version keeps the proxy understandable. You could make header handling more complete later, but for this post we want a sharp little development tool rather than a production reverse proxy.

Match the rule

Rule matching can stay basic:

public static class ChaosRules
{
    public static ChaosRule? Match(PathString path, IReadOnlyList<ChaosRule> rules)
    {
        var value = path.Value ?? string.Empty;
        return rules.FirstOrDefault(rule =>
            value.StartsWith(rule.PathPrefix, StringComparison.OrdinalIgnoreCase));
    }
}

This means a rule for /payments also matches /payments/123 and /payments/search. That’s normally good enough for local resilience testing. Once the proxy helps you find real issues, you can decide whether you need method matching, header matching, or separate request and response behaviours. Don’t start there. Start with the smallest thing that makes your client uncomfortable.

Add the chaos engine

The chaos engine is just probability checks and a little delay logic.

public static class ChaosEngine
{
    public static async Task ApplyLatencyAsync(
        ChaosRule rule,
        CancellationToken cancellationToken)
    {
        if (rule.Latency is null)
        {
            return;
        }

        var min = Math.Max(0, rule.Latency.MinMs);
        var max = Math.Max(min, rule.Latency.MaxMs);
        var delay = Random.Shared.Next(min, max + 1);

        await Task.Delay(delay, cancellationToken);
    }

    public static bool ShouldFail(ChaosRule rule) =>
        Happens(rule.FailureRate);

    public static bool ShouldReturnMalformedJson(ChaosRule rule) =>
        Happens(rule.MalformedJsonRate);

    public static bool ShouldDropConnection(ChaosRule rule) =>
        Happens(rule.DropConnectionRate);

    public static int PickStatusCode(ChaosRule rule)
    {
        if (rule.StatusCodes.Count == 0)
        {
            return StatusCodes.Status500InternalServerError;
        }

        var index = Random.Shared.Next(rule.StatusCodes.Count);
        return rule.StatusCodes[index];
    }

    private static bool Happens(double probability)
    {
        if (probability <= 0)
        {
            return false;
        }

        if (probability >= 1)
        {
            return true;
        }

        return Random.Shared.NextDouble() < probability;
    }
}

This gives you four useful behaviours. Latency exposes bad timeout settings. Injected status codes expose weak retry policies. Dropped connections expose clients that assume every failure comes back as a clean HTTP response. Malformed JSON exposes parsing code that treats successful status codes as proof that the body is safe. That last one is worth testing. A 200 OK with broken JSON is still a failed call from your application's point of view.

Run it locally

Run the proxy on a local port:

dotnet run --urls http://localhost:5050

Then point your application at the proxy:

{
  "CustomersApi": {
    "BaseUrl": "http://localhost:5050"
  }
}

Your application thinks it is calling the downstream service. The proxy forwards the call. Every now and then, it makes the world worse. That’s exactly what we want.

Test it with a normal HTTP client

Imagine a typed client like this:

public sealed class CustomersClient(HttpClient httpClient)
{
    public async Task<CustomerDto?> GetCustomerAsync(
        int id,
        CancellationToken cancellationToken)
    {
        using var response = await httpClient.GetAsync(
            $"/customers/{id}",
            cancellationToken);

        response.EnsureSuccessStatusCode();

        return await response.Content.ReadFromJsonAsync<CustomerDto>(
            cancellationToken);
    }
}

That code looks normal. It might even be fine for some internal calls. Put the chaos proxy in front of it and you learn more quickly. What happens when the response takes three seconds? What happens when the connection drops? What happens when the server returns 429? What gets logged? Does the caller see a useful error? Does the cancellation token actually stop the work? Those are the questions that are important when the client becomes part of a real workflow.

Add retries carefully

It’s tempting to slap a retry policy on every outbound call and call it resilience. That can make things worse. Retrying a safe lookup is usually fine. Retrying a payment submission is a different conversation. If the downstream might have accepted the request before the connection failed, your retry could create a duplicate action unless the operation is idempotent. That’s where the chaos proxy becomes useful. It doesn't just prove that retries happen. It forces you to look at whether the operation is safe to retry in the first place.

For local testing, you can add rules like this:

{
  "PathPrefix": "/payments",
  "FailureRate": 0.25,
  "DropConnectionRate": 0.10,
  "StatusCodes": [429, 500, 503]
}

Then watch how your app behaves. If it blindly retries everything, you have learned something. If it gives up too early, you have learned something else. If the logs can’t tell you what happened, that’s a separate problem the proxy has just made visible.

Make the response worse after the real API succeeds

One useful trick is returning malformed JSON after the downstream has already responded successfully. In the code above, the proxy forwards the request and receives a real response. Then this block can replace the response body:

if (rule is not null && ChaosEngine.ShouldReturnMalformedJson(rule))
{
    context.Response.ContentType = "application/json";
    await context.Response.WriteAsync("{ \"customerId\": ", context.RequestAborted);
    return;
}

That sounds artificial, but it catches a real class of bug. Plenty of client code treats HTTP success as the end of the story. It then assumes the response body can be parsed, mapped, and trusted. A bad response body should be handled as deliberately as a bad status code. This is especially useful when your app receives JSON from systems you don’t own.

What this exposes quickly

The first thing you will usually find is timeout confusion. Some clients have no explicit timeout. Some have several timeouts fighting each other. Some set a timeout on HttpClient and then forget that the calling workflow has its own cancellation token. Others use retries in a way that turns a short downstream incident into a much longer user-facing delay. The second thing you will find is weak logging. When a call fails, you need enough information to understand the failure without dumping secrets into your logs. The method, path, status code, elapsed time, retry attempt, and correlation ID are usually more useful than a giant exception blob. The third thing you will find is unsafe retry behaviour. A lookup can normally be retried. A command needs more thought. If a request changes state in another system, a dropped connection doesn't prove the operation failed. It only proves you didn't receive the answer.

That distinction is where a lot of production bugs live.

A few useful extensions

Once the basic proxy works, you can add a request log so every forwarded call records the method, path, selected rule, injected behaviour, status code, and elapsed time. You can also add response templating. For example, a rule could return a realistic 429 body with a Retry-After header. That gives you a better local test for rate limiting behaviour.

Another useful addition is deterministic chaos. Instead of using pure randomness, you can fail every fifth request or fail requests with a specific header. That makes demos and automated tests easier to repeat.

The final feature I would add is a small dashboard. It doesn’t need to be fancy. A single page showing recent requests and injected failures is enough to make the proxy feel like a real development tool.

Where this should stop

This is a local development tool. Treat it that way. Don’t put it in the production path. Don’t turn it into your company's unofficial gateway. Don’t keep adding features until you’ve accidentally built a worse version of tools that already exist.

The value is in the feedback loop. Run your app locally. Route one downstream through the proxy. Make the client deal with latency, broken responses, and unreliable connections. Fix what breaks. That’s enough.

A local chaos proxy is useful because it makes failure cheap. You don’t have to wait for a real downstream outage to discover that your timeout is too high, your retry policy is unsafe, your logs are too vague, or your JSON parsing has no useful failure path. You can find those problems on your own machine. That’s the kind of local tooling I like. Small enough to understand. Annoying enough to reveal the truth. Practical enough to keep around.