Testing payload drift in .NET with PayloadGremlin

In the last post, we built a small chaos proxy in C#. That proxy sat between our application and a downstream API. It made the network awkward on purpose. Sometimes it added latency. Sometimes it returned a bad status code. Sometimes it dropped the connection. Sometimes it sent malformed JSON back after the downstream had already responded. That kind of local tool is useful because it makes failure cheap. You don't have to wait for a real outage to learn that your timeout is wrong, your retry policy is unsafe, or your logs don't tell you enough.
But there's another kind of failure that the proxy only touches at the edge. The HTTP call works. The status code is fine. The response body is valid JSON. The message arrives on the queue. The webhook posts successfully. Nothing looks broken from the transport layer. Then your application tries to use the payload.
That's where a lot of production bugs live.
A number arrives as a string. A date changes format. A required field is suddenly null. An enum gets a value your code hasn't seen before. A legacy system sends "true" instead of true. A downstream adds new properties. A casing change slips through because another serialiser got introduced. This is the part that made me build PayloadGremlin.
I wanted a simple, lightweight way to take a known-good JSON payload and generate realistic bad variants from it. I wasn't looking for a big fuzzing framework. I didn't want to wire up a full chaos engineering platform. I wanted something I could drop into ordinary .NET tests and use against the payloads already sitting in the repo.
PayloadGremlin makes good JSON payloads misbehave on purpose.
Repository: github.com/kearns2000/payloadgremlin
The chaos proxy tested the road
The chaos proxy post was about the road between your app and another system. Can your client handle a slow response? Can it handle a 503? Does it retry safely? Does it pass cancellation tokens through properly? Does it log enough when the downstream lies to it? Thats important because a lot of API client code is written as if the network is a polite function call. It isn't. But even when the road is fine, the cargo can still be wrong.
A clean 200 OK doesn't mean the body contains what your application expects. A queue message being delivered doesn't mean the shape is safe. A webhook signature being valid doesn't mean the payload won't break your mapper. That's the next layer of resilience. The proxy helps you test unreliable communication. PayloadGremlin helps you test unreliable contracts.
The hidden problem: realistic JSON drift
Most .NET API tests use payloads that are too clean. The request has the right fields. The response has the right types. Dates are formatted exactly how the DTO expects them. Numbers are numbers. Booleans are booleans. Required properties are present. Everything is neat. That's useful, but it can create a false sense of safety. The problem isn't that teams never write negative tests. They do. The problem is that hand-written negative tests usually cover the cases somebody already thought about. You test that customerId is missing. You test that premium is negative. You test that startDate is invalid.
Then production sends you a case where premium is "1200.5" because some older system exports everything as text. Or startDate becomes "01/01/2026" because somebody changed an integration mapping. Or a field that was always an object becomes an empty array. These are not dramatic failures. They're small contract shifts. That's why they're dangerous.
They often get past the first layer of the system. The body parses. The message is accepted. The workflow starts. Then the failure appears deeper in the app, where the error is harder to understand and the recovery path is messier. PayloadGremlin is aimed at that space. It doesn't try to generate random nonsense. It starts from a valid payload and creates believable mutations. The result is close enough to your real contract to be useful, but awkward enough to expose brittle assumptions.
Install PayloadGremlin
PayloadGremlin is a small .NET library, so the setup is simple.
dotnet add package PayloadGremlin
Then give it a payload and ask for generated cases.
using PayloadGremlin;
var json = """
{
"customerId": "123",
"premium": 1200.50,
"active": true,
"startDate": "2026-01-01"
}
""";
var gremlin = global::PayloadGremlin.PayloadGremlin.Create(options =>
{
options.WithSeed(12345);
options.UseProfile(GremlinProfile.RealisticApiDrift);
options.MaxCases(10);
});
var result = gremlin.Generate(json);
foreach (var testCase in result.Cases)
{
Console.WriteLine(testCase.Name);
Console.WriteLine(testCase.Payload);
}
That's the basic idea.
You keep a known-good fixture. PayloadGremlin turns it into a set of mutated cases. You feed those cases into your parser, endpoint, client, or message consumer. The test decides what safe behaviour means. Sometimes the mutated payload should still be accepted. Sometimes it should be rejected with a clear validation error. What you don't want is an accidental 500, a half-written record, or a broken message moving deeper into the system.
Start with a fixture thats important
I wouldn't start by pointing PayloadGremlin at every JSON file in a codebase. Start with one payload that would hurt if it broke. That might be a payment request, a quote submission, a claims notification, an order import, or a webhook from a system you don't control. Pick something with business consequences. Pick something where a bad assumption would be expensive.
Say this is the payload your API receives:
{
"customerId": "CUST-123",
"premium": 1200.50,
"currency": "EUR",
"active": true,
"startDate": "2026-01-01"
}
Your happy path test probably posts that exact JSON and proves the endpoint works.
PayloadGremlin lets you reuse it.
public static IEnumerable<object[]> GremlinCases()
{
var json = File.ReadAllText("Fixtures/create-policy.json");
var result = global::PayloadGremlin.PayloadGremlin.Create(options =>
{
options.WithSeed(42);
options.UseProfile(GremlinProfile.RealisticApiDrift);
options.MaxCases(25);
}).Generate(json);
return result.Cases.Select(testCase =>
new object[] { testCase.Name, testCase.Payload });
}
That turns one clean fixture into a small suite of payload drift cases. The seed is important. You want repeatable tests. If a case fails in CI, you need to reproduce the same payload locally. Random failures that can't be replayed are a fast way to make a team hate this kind of testing. PayloadGremlin's seed support keeps the output deterministic for the same input, configuration, and seed.
Testing an ASP.NET Core endpoint
The most obvious use is an API integration test. Take a real endpoint. Post mutated payloads into it. Assert that the response is controlled.
using System.Net;
using System.Text;
using Microsoft.AspNetCore.Mvc.Testing;
using PayloadGremlin;
public sealed class CreatePolicyEndpointTests
: IClassFixture<WebApplicationFactory<Program>>
{
private readonly HttpClient _client;
public CreatePolicyEndpointTests(WebApplicationFactory<Program> factory)
{
_client = factory.CreateClient();
}
[Theory]
[MemberData(nameof(GremlinCases))]
public async Task CreatePolicy_handles_realistic_payload_drift(
string name,
string payload)
{
using var content = new StringContent(
payload,
Encoding.UTF8,
"application/json");
var response = await _client.PostAsync("/policies", content);
Assert.True(
response.StatusCode is HttpStatusCode.Accepted
or HttpStatusCode.BadRequest
or HttpStatusCode.UnprocessableEntity,
$"Unexpected status code {(int)response.StatusCode} for case {name}");
}
public static IEnumerable<object[]> GremlinCases()
{
var json = File.ReadAllText("Fixtures/create-policy.json");
var result = global::PayloadGremlin.PayloadGremlin.Create(options =>
{
options.WithSeed(42);
options.UseProfile(GremlinProfile.RealisticApiDrift);
options.MaxCases(25);
}).Generate(json);
return result.Cases.Select(testCase =>
new object[] { testCase.Name, testCase.Payload });
}
}
The assertion is deliberately not saying every case must pass. That would be the wrong lesson. Some mutated payloads should be rejected. If a required field is missing, a 400 Bad Request might be correct. If a value is present but invalid for the business workflow, 422 Unprocessable Entity might be better. The exact status codes depend on your API design. The important thing is that the endpoint stays in control.
A bad payload shouldn't escape as a vague 500. It shouldn't create a record with broken values. It shouldn't pass model binding and fail three layers later with an exception that means nothing to the caller. You're testing whether your API has a clear failure path.
Testing deserialisers and mappers
You don't always need the full ASP.NET Core pipeline. Sometimes the fragile part is a parser, mapper, or client response model. That's especially true when you consume APIs owned by another team or another company.
A small test can still be valuable.
[Theory]
[MemberData(nameof(GremlinCases))]
public void Policy_payload_parser_handles_drift_without_throwing(
string name,
string payload)
{
var result = PolicyPayloadParser.TryParse(payload, out var policy);
Assert.True(
result.IsSuccess || result.IsValidationFailure,
$"Unexpected parser result for gremlin case: {name}");
}
That might look too simple, but it catches a real class of bug. A parser shouldn't throw a random exception because a number became a string. It should either support that input or reject it in a way the rest of the application understands. This is especially useful for typed clients. A lot of client code treats a successful HTTP response as the hard part. It calls EnsureSuccessStatusCode(), then reads JSON into a DTO, then assumes the DTO is safe. That works until the response body is valid JSON with a shape the DTO doesn't quite understand.
The chaos proxy can return broken JSON. PayloadGremlin goes further by generating valid JSON that has drifted away from the shape your client expected. That difference is important. Broken JSON tests your syntax failure path. Drifted JSON tests your contract failure path.
Choosing a mutation profile
PayloadGremlin uses profiles so you don't have to manually list every mutation type for basic scenarios. For most API work, I'd start with RealisticApiDrift. That profile is aimed at everyday contract drift. Fields become null. Numbers may be represented as strings. Date formats can change. Property casing can shift. This is the kind of thing that happens when systems evolve separately. If you want to be harder on strict clients, use StrictClientBreaker. That profile is useful when your DTOs, validators, or client models are making strong assumptions about the payload shape.
For older integrations, LegacySystemWeirdness is usually more interesting. Legacy systems often don't fail in clean modern ways. They send whitespace you didn't expect. They use odd boolean representations. They format decimal values differently. They produce text that technically parses but still needs careful handling.
DateAndMoneyChaos is the one I'd reach for when a payload contains financial values, policy dates, settlement dates, renewal dates, or anything else where parsing incorrectly is worse than rejecting the payload.
There's also Aggressive when you want broader coverage while keeping generated payloads valid JSON by default. The profile should match the risk.
A public API boundary probably needs different pressure than an internal event. A payment payload probably deserves more date and money testing than a search request. A legacy import deserves different treatment than a clean service-to-service contract.
Path configuration keeps the noise down
Payload chaos becomes useless when every generated case is nonsense. Some fields are critical. Some fields are optional. Some fields should never be removed in a useful test because every case would fail at the first validation rule. Other fields are safe to ignore because they're metadata. PayloadGremlin lets you target or protect paths.
var gremlin = global::PayloadGremlin.PayloadGremlin.Create(options =>
{
options.WithSeed(12345);
options.UseProfile(GremlinProfile.RealisticApiDrift);
options.ForPath("$.customerId", path =>
{
path.DoNotRemove();
path.AllowNull(false);
});
options.ExcludePath("$.metadata.traceId");
});
That makes the generated cases more useful. For example, if every mutation removes customerId, you may only prove that your required field validation works. That's fine once, but it doesn't tell you how the rest of the payload behaves. Protecting that path lets you test more interesting drift elsewhere.
On the other side, excluding a trace ID or correlation field can keep your cases focused. You probably don't need ten tests proving that your app ignores a diagnostic field. The goal is signal. You want generated cases that make the application uncomfortable, not cases that all fail for the same obvious reason.
Mutation metadata makes failures understandable
A common problem with generated tests is diagnosis. When a test fails, you need to know what changed. PayloadGremlin cases include mutation metadata, so a failure isn't just "payload number seven broke". Each case can tell you the JSON path, mutation type, before and after values, severity, whether the output is valid JSON, and a human readable description.
That means your test output can include useful context.
foreach (var mutation in testCase.Mutations)
{
output.WriteLine($"{mutation.MutationType} @ {mutation.JsonPath}");
output.WriteLine($" {mutation.Description}");
}
A failing generated test with no explanation feels like noise. A failing generated test that says NumberToString @ $.premium is immediately actionable. You can look at the code and ask the right question.
Should this endpoint accept numeric strings for premium? If yes, the parser needs to support it deliberately. If no, the validation path needs to reject it cleanly. Either answer is fine. What's not fine is discovering that the value gets silently converted to something wrong.
Shrinking helps isolate the real problem
Some generated cases contain more than one mutation. That's useful because production payloads often have more than one oddity. A partner system might change a date format and add new fields in the same release. A legacy export might send whitespace, string booleans, and odd decimal formatting in the same message. But when a test fails, multiple mutations can make the cause harder to see.
PayloadGremlin includes shrinking for failing cases.
var smallerCases = gremlin.Shrink(failingCase);
The first version removes one mutation at a time. That helps you move from "this messy payload broke the endpoint" to "this specific mutation exposes the bug". That's the difference between a useful test and a frustrating one.Developers don't need more mystery. They need a fast route from failure to fix.
Reports give you visibility
Once you start generating cases across a few important payloads, it helps to see what was actually tested. PayloadGremlin can generate a Markdown report.
var result = gremlin.Generate(json);
Console.WriteLine(result.Report.ToMarkdown());
The report summarises total cases, mutation types used, paths touched, paths not touched, invalid JSON count, and the reproduction seed. That can be useful in a PR because it makes the test coverage visible. If a contract changes, you can see whether the new fields are being touched. If a fixture grows, you can spot paths that never get mutated. If a test fails, the seed helps reproduce the exact case. This is also useful when introducing this style of testing to a team.
Generated tests can sound vague until people see the report. Then it becomes clearer. You're not just "doing fuzzing". You're exercising specific drift around specific payload paths.
Invalid JSON still has a place
By default, PayloadGremlin keeps generated payloads valid JSON. That's intentional. Most contract drift bugs I care about are not syntax problems. The JSON parses. The HTTP call succeeds. The message arrives. The shape is just not quite what the application expected. Still, invalid JSON is worth testing at the boundary.
options.AllowInvalidJson();
That enables cases like truncated JSON, trailing commas, and broken string quotes. I'd use this carefully.
Malformed JSON is good for testing the edge of the system. Does the API return a clean 400? Does the webhook handler avoid leaking an internal exception? Does the queue consumer dead-letter the message with a useful reason? Once you're deeper than the boundary, valid but weird JSON usually gives better feedback.
How this fits with the chaos proxy
The chaos proxy and PayloadGremlin solve different problems.
The proxy is useful when you want to test the communication path. It makes the downstream slow, unavailable, unreliable, or syntactically broken. It's a local tool for proving that your HTTP client isn't fragile.
PayloadGremlin is useful when you want to test the data shape. It makes a good payload drift in realistic ways, then lets your application prove that it can handle or reject that drift properly.
Used together, they cover a much wider slice of integration failure.
The proxy asks, "What happens when the road is unreliable?" PayloadGremlin asks, "What happens when the cargo is weird?" In real systems, both questions matter.
A practical CI setup
I'd keep the first CI version small. Choose one high value fixture. Generate maybe twenty or thirty cases. Use a fixed seed. Assert that the endpoint or consumer never returns an uncontrolled failure. Print mutation metadata when something fails. That's enough to start getting value. You can always grow from there.
Once the first test catches a real issue, add another fixture. Once the team trusts the output, increase the cases slightly. Once you find repeated failures around dates or money, use a more focused profile for those payloads. Don't turn the test suite into a slot machine. If the tests are noisy, people will ignore them. If they're repeatable and explain themselves, they become part of the normal engineering feedback loop. A good payload chaos test should be boring to run and annoying only when it finds something real.
What you should assert
The best assertion depends on the layer you're testing. For an API endpoint, I usually care that the response is deliberate. A good mutated-payload test might allow success, 400 Bad Request, or 422 Unprocessable Entity, but reject 500 Internal Server Error. For a message consumer, I care about safe handling. A bad message should be rejected, dead lettered, or recorded as a validation failure. It shouldn't retry forever. It shouldn't create partial state. It shouldn't disappear without enough diagnostic information.
For a typed client, I care about mapping and error boundaries. If the upstream response drifts, the client should return a useful failure to the calling code. It shouldn't throw a low-level JSON exception from somewhere deep inside the call stack unless that's an explicit part of the contract.
For a parser, I care about clear outcomes. Parse successfully or fail with validation details. Don't half-parse a model and leave the rest of the app to discover the damage later. The point isn't to make every bad payload pass. The point is to make every bad payload handled.
Why I built it
I built PayloadGremlin because I wanted this style of testing to be easy in .NET. I could find ways to fuzz data. I could find property based testing libraries. I could write custom generators by hand. I could keep adding one off negative payloads to test projects. None of that felt like the small tool I wanted. I wanted something focused on JSON payload drift. Something that worked from a real payload. Something that produced deterministic cases. Something that told me what it changed. Something that could sit inside an xUnit test without becoming a project of its own.
That's the gap PayloadGremlin is trying to fill. It's not trying to replace contract testing. It's not trying to replace the chaos proxy. It's not trying to become a full test platform. It's a small tool for a specific problem. Good JSON lies all the time.
PayloadGremlin helps you catch your code believing it.
Where to use it first
Start where the contract is outside your full control. That might be a third party API response. It might be a webhook body. It might be a queue message from another team. It might be an import file that eventually becomes JSON inside your system. Then look for fields where silent mistakes are expensive. Dates. Money. Identifiers. Status values. Anything that changes workflow state. Anything that affects billing, approvals, claims, payments, customer records, or audit history. Those are the fields where "it parsed" isn't good enough.
You need to know whether the application understood the payload correctly, rejected it clearly, or drifted into a dangerous middle ground. That's the value of PayloadGremlin. It turns one clean payload into many uncomfortable questions. And those questions are much cheaper to answer in a test suite than in production.





