Building .NET Applications with the Claude AI C# SDK

The Claude API is no longer something .NET teams need to wrap by hand. Anthropic now publishes an official C# SDK through the Anthropic NuGet package, and the SDK gives .NET applications a typed way to call the Messages API, stream responses, handle errors, configure retries, and integrate with Microsoft.Extensions.AI.

Most real .NET applications should not treat an AI model as a loose HTTP call hidden inside a controller. You want the same engineering shape you would expect around any external dependency, configuration, dependency injection, timeouts, retries, cancellation, logging, test seams, and clear application boundaries.

Below I'll walk through a practical .NET 10 style integration using the official Claude C# SDK. The examples focus on application code you could actually evolve into production code, not just a console demo that hardcodes an API key and prints a response.

The SDK package

The current official package name is Anthropic.

dotnet add package Anthropic

The important naming detail is that package versions 10 and later are the official Anthropic C# SDK. Older Anthropic 3.x versions belonged to the previous community SDK lineage, which moved to tryAGI.Anthropic. If you see old blog posts or examples using different package names or older client APIs, treat them carefully.

The official SDK targets .NET Standard 2.0 and also ships framework-specific support for modern .NET versions. That makes it usable from older libraries, worker services, ASP.NET Core APIs, Azure Functions, and modern .NET 10 applications.

For local development, set your API key as an environment variable rather than putting it into appsettings.json.

export ANTHROPIC_API_KEY="your-api-key"

On Windows PowerShell, use this instead.

$env:ANTHROPIC_API_KEY="your-api-key"

The SDK reads ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, and ANTHROPIC_BASE_URL from the environment when you create a default AnthropicClient.

The basic request flow

A typical .NET application should keep Claude behind an application service. Your endpoint should accept the HTTP request, validate it, hand work to a service, and let that service call Claude. That keeps model access away from controllers and makes it easier to add rate limiting, caching, auditing, and fallback behaviour later.

The simplest direct SDK call uses AnthropicClient, creates a MessageCreateParams object, and sends it to client.Messages.Create.

using Anthropic;
using Anthropic.Models.Messages;

AnthropicClient client = new();

MessageCreateParams parameters = new()
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 512,
    Messages =
    [
        new()
        {
            Role = Role.User,
            Content = "Explain idempotency in distributed systems in plain English."
        }
    ]
};

var message = await client.Messages.Create(parameters);

Console.WriteLine(message);

That example is useful because it proves the SDK is working. It is not the shape I would keep inside a production ASP.NET Core endpoint. Once you put Claude behind a web API, you should take dependency injection, cancellation, failure handling, and observability seriously.

A minimal ASP.NET Core endpoint

Create a new .NET 10 API.

dotnet new webapi -n ClaudeDotNetDemo -f net10.0
cd ClaudeDotNetDemo
dotnet add package Anthropic
dotnet add package Microsoft.Extensions.AI

Then create a request contract.

public sealed record SummariseRequest(string Text);

public sealed record SummariseResponse(string Summary);

Now register the SDK client and an application service. The default client will read ANTHROPIC_API_KEY from the environment, which is exactly what you want locally and in deployed environments where the secret comes from a secure configuration source.

using Anthropic;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddSingleton(new AnthropicClient
{
    MaxRetries = 3,
    Timeout = TimeSpan.FromSeconds(90),
    ResponseValidation = true
});

builder.Services.AddScoped<ClaudeSummaryService>();

var app = builder.Build();

app.MapPost("/api/summaries", async (
    SummariseRequest request,
    ClaudeSummaryService summaryService,
    CancellationToken stopToken) =>
{
    if (string.IsNullOrWhiteSpace(request.Text))
    {
        return Results.BadRequest(new
        {
            Error = "Text is required."
        });
    }

    var summary = await summaryService.SummariseAsync(request.Text, stopToken);

    return Results.Ok(new SummariseResponse(summary));
});

app.Run();

The service owns the prompt and the model call.

using Anthropic;
using Anthropic.Models.Messages;

public sealed class ClaudeSummaryService(
    AnthropicClient client,
    ILogger<ClaudeSummaryService> logger)
{
    public async Task<string> SummariseAsync(
        string text,
        CancellationToken stopToken)
    {
        MessageCreateParams parameters = new()
        {
            Model = Model.ClaudeOpus4_7,
            MaxTokens = 800,
            Messages =
            [
                new()
                {
                    Role = Role.User,
                    Content = $$"""
                    Summarise the following text 

                    Keep the summary short, accurate, and practical.

                    Text:
                    {{text}}
                    """
                }
            ]
        };

        try
        {
            var message = await client.Messages.Create(parameters);

            return message.ToString();
        }
        catch (AnthropicRateLimitException ex)
        {
            logger.LogWarning(ex, "Claude rate limit hit while summarising text.");
            throw;
        }
        catch (AnthropicApiException ex)
        {
            logger.LogError(ex, "Claude API error while summarising text.");
            throw;
        }
    }
}

The example returns message.ToString() to avoid pretending every response in every SDK version has the same helper method for flattening content blocks. In a real application, write a small adapter that extracts the text content blocks you allow, validates that the model returned the shape you expected, and hides the SDK response object from the rest of your system.

That adapter is important. Claude can return more than one content block, especially once you use tools, citations, files, or structured outputs. Your domain code should not care about raw model response shapes.

Use `IChatClient` when you want a .NET abstraction

The direct AnthropicClient is useful when you want full Claude-specific API access. The IChatClient integration is useful when you want Claude to sit behind the same .NET AI abstraction as other model providers.

This is a good fit when your application code should not care which model provider is behind the interface, when you want Microsoft.Extensions.AI middleware, or when you want function invocation, caching, telemetry, and other cross-cutting behaviours around the chat client.

A simple registration can expose Claude through IChatClient.

using Anthropic;
using Microsoft.Extensions.AI;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddSingleton(new AnthropicClient
{
    MaxRetries = 3,
    Timeout = TimeSpan.FromSeconds(90),
    ResponseValidation = true
});

builder.Services.AddChatClient(services =>
{
    var client = services.GetRequiredService<AnthropicClient>();

    return client
        .AsIChatClient("claude-opus-4-7")
        .AsBuilder()
        .Build(services);
});

builder.Services.AddScoped<ClaudeChatService>();

Your service can then depend on IChatClient instead of depending directly on Anthropic types.

using Microsoft.Extensions.AI;

public sealed class ClaudeChatService(IChatClient chatClient)
{
    public async Task<string> AskAsync(
        string prompt,
        CancellationToken stopToken)
    {
        ChatResponse response = await chatClient.GetResponseAsync(
            prompt,
            cancellationToken: stopToken);

        return response.Text;
    }
}

This is the cleaner seam for most business applications. It gives you a stable boundary for testing and it stops Anthropic SDK types from leaking through your own application layer. The trade-off is that some provider-specific features may be easier to access through AnthropicClient directly. That is normal. Use the abstraction where it helps, and drop down to the SDK when you need Claude-specific capabilities.

Streaming responses

For user-facing chat, streaming usually feels better than waiting for the entire response. Claude supports streaming through server-sent events, and the C# SDK exposes streaming methods as IAsyncEnumerable.

The direct SDK streaming shape looks like this.

using Anthropic;
using Anthropic.Models.Messages;

AnthropicClient client = new();

MessageCreateParams parameters = new()
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 1024,
    Messages =
    [
        new()
        {
            Role = Role.User,
            Content = "Write a short explanation of CQRS for .NET developers."
        }
    ]
};

await foreach (var chunk in client.Messages.CreateStreaming(parameters))
{
    Console.WriteLine(chunk);
}

If you use IChatClient, streaming is also exposed as an async stream.

using Microsoft.Extensions.AI;

public sealed class StreamingChatService(IChatClient chatClient)
{
    public async IAsyncEnumerable<string> StreamAsync(
        string prompt,
        [System.Runtime.CompilerServices.EnumeratorCancellation]
        CancellationToken stopToken)
    {
        await foreach (var update in chatClient
            .GetStreamingResponseAsync(prompt, cancellationToken: stopToken))
        {
            yield return update.ToString();
        }
    }
}

For a browser client, you can expose server-sent events from ASP.NET Core. Keep the endpoint simple. The model stream should not mutate state directly. If you need to persist a conversation, persist user input before streaming and persist the final assistant response after the stream completes.

using System.Text.Json;
using Microsoft.Extensions.AI;

app.MapPost("/api/chat/stream", async (
    SummariseRequest request,
    IChatClient chatClient,
    HttpResponse response,
    CancellationToken stopToken) =>
{
    response.Headers.ContentType = "text/event-stream";

    await foreach (var update in chatClient
        .GetStreamingResponseAsync(request.Text, cancellationToken: stopToken))
    {
        var json = JsonSerializer.Serialize(update.ToString());

        await response.WriteAsync($"data: {json}\n\n", stopToken);
        await response.Body.FlushAsync(stopToken);
    }
});

The flow is straightforward.

Streaming is not just a UI trick. It also helps long-running model calls stay alive because useful data keeps moving across the connection. Still, you need sensible server timeouts and client cancellation. Always pass CancellationToken through your endpoints and abstractions that accept it. Also configure request timeouts on the SDK client so long-running calls cannot hang indefinitely.

Error handling and retries

The SDK has its own exception hierarchy. That is good because you can separate a rate limit from a bad request, an authentication failure, a server-side failure, or a network problem. A sensible application service should catch only the errors it can translate into application behaviour. Do not catch every exception and return a vague "AI failed" message. That makes support and operations harder.

using Anthropic;

public sealed class ClaudeGateway(
    AnthropicClient client,
    ILogger<ClaudeGateway> logger)
{
    public async Task<object> SendAsync(
        MessageCreateParams parameters,
        CancellationToken stopToken)
    {
        try
        {
            return await client.Messages.Create(parameters);
        }
        catch (AnthropicRateLimitException ex)
        {
            logger.LogWarning(ex, "Claude request was rate limited.");
            throw new TemporaryAiFailureException(
                "Claude is currently rate limiting requests.", ex);
        }
        catch (AnthropicUnauthorizedException ex)
        {
            logger.LogError(ex, "Claude authentication failed.");
            throw new MisconfiguredAiClientException(
                "Claude API authentication failed.", ex);
        }
        catch (Anthropic5xxException ex)
        {
            logger.LogWarning(ex, "Claude returned a server error.");
            throw new TemporaryAiFailureException(
                "Claude returned a temporary server error.", ex);
        }
        catch (AnthropicIOException ex)
        {
            logger.LogWarning(ex, "Network error while calling Claude.");
            throw new TemporaryAiFailureException(
                "Network failure while calling Claude.", ex);
        }
    }
}

public sealed class TemporaryAiFailureException(
    string message,
    Exception innerException) : Exception(message, innerException);

public sealed class MisconfiguredAiClientException(
    string message,
    Exception innerException) : Exception(message, innerException);

The SDK retries some transient failures by default. You can set MaxRetries on the client or per call with WithOptions.

AnthropicClient client = new()
{
    MaxRetries = 3,
    Timeout = TimeSpan.FromSeconds(90)
};

Per-call options are useful when one operation has a different tolerance from the rest of the application.

var response = await client
    .WithOptions(options => options with
    {
        MaxRetries = 1,
        Timeout = TimeSpan.FromSeconds(20)
    })
    .Messages.Create(parameters);

Do not rely on retries alone. If the operation triggers side effects through tools or downstream systems, you still need idempotency. Retrying a summarisation request is usually safe. Retrying an operation that sends an email, creates a ticket, or approves a payment is not safe unless you designed it to be safe.

Tool calling with `Microsoft.Extensions.AI`

Tool calling is where .NET’s AI abstraction becomes more interesting. You can expose selected .NET methods as tools and let the model request them. The application remains responsible for executing the function and returning the result to the model. Claude should not get direct access to your database, payment provider, or admin operations. It should get carefully shaped application functions with narrow inputs, clear descriptions, validation, logging, and permission checks.

using System.ComponentModel;
using Microsoft.Extensions.AI;

public sealed class SupportAssistant(IChatClient chatClient)
{
    public async Task<string> AnswerAsync(
        string question,
        CancellationToken stopToken)
    {
        ChatOptions options = new()
        {
            Tools =
            [
                AIFunctionFactory.Create(GetRefundPolicy)
            ]
        };

        var response = await chatClient.GetResponseAsync(
            question,
            options,
            stopToken);

        return response.Text;
    }

    [Description("Gets the current refund policy for software subscriptions.")]
    private static string GetRefundPolicy()
    {
        return """
        Customers can request a refund within 14 days of the first payment
        if usage remains below the fair-use threshold. Renewals are reviewed
        case by case by support.
        """;
    }
}

To enable automatic function invocation, wrap the Anthropic chat client through the IChatClient builder.

builder.Services.AddChatClient(services =>
{
    var client = services.GetRequiredService<AnthropicClient>();

    return client
        .AsIChatClient("claude-opus-4-7")
        .AsBuilder()
        .UseFunctionInvocation()
        .Build(services);
});

This is a strong pattern for internal support assistants, documentation assistants, workflow helpers, and operational chat tools. The model can reason over the user request, ask for the data it needs, and produce the final answer. Your code still owns the boundary.

Keep tools boring. A good tool is deterministic, narrow, validated, observable, and easy to test. A bad tool is a vague method called DoAction that accepts arbitrary JSON and can mutate important production state.

Prompt ownership

A common mistake is to let prompts grow inside endpoint bodies. That works for a demo and becomes painful quickly. Treat important prompts as application assets. Put them behind a small service, version them, test them against representative inputs, and log the prompt version used for each call.

A simple pattern is to keep prompts as named builders.

public static class SummaryPrompts
{
    public const string Version = "summary-v1";

    public static string Build(string text)
    {
        return $$"""
        You are helping a software engineering team understand a technical document.

        Produce a concise summary with:
        1. The main point.
        2. The practical engineering impact.
        3. Any risks or assumptions.

        Do not invent facts. If the text does not provide enough detail, say so.

        Text:
        {{text}}
        """;
    }
}

Then use the prompt builder from your service.

MessageCreateParams parameters = new()
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 800,
    Messages =
    [
        new()
        {
            Role = Role.User,
            Content = SummaryPrompts.Build(text)
        }
    ]
};

logger.LogInformation(
    "Calling Claude with prompt version {PromptVersion}.",
    SummaryPrompts.Version);

For high-value use cases, store the prompt version beside the AI output. This makes debugging much easier when someone later asks why the answer changed.

Configuration in real applications

For local development, environment variables are fine. For deployed systems, use your platform’s secret store. In Azure, that usually means Key Vault references, managed identity, and app settings. Do not commit API keys to source control, and do not put them into client-side applications.

A typical app settings shape might look like this.

{
  "Claude": {
    "Model": "claude-opus-4-7",
    "MaxTokens": 800,
    "TimeoutSeconds": 90,
    "MaxRetries": 3
  }
}

Then bind it to an options object.

public sealed class ClaudeOptions
{
    public required string Model { get; init; }

    public int MaxTokens { get; init; } = 800;

    public int TimeoutSeconds { get; init; } = 90;

    public int MaxRetries { get; init; } = 3;
}

builder.Services.Configure<ClaudeOptions>(
    builder.Configuration.GetSection("Claude"));

builder.Services.AddSingleton<AnthropicClient>(services =>
{
    var options = services
        .GetRequiredService<IOptions<ClaudeOptions>>()
        .Value;

    return new AnthropicClient
    {
        MaxRetries = options.MaxRetries,
        Timeout = TimeSpan.FromSeconds(options.TimeoutSeconds),
        ResponseValidation = true
    };
});

Model choice is a configuration decision, but do not make it completely arbitrary. Different models have different cost, latency, and capability profiles. Put allowed model names behind configuration, but keep a controlled list in your application or deployment process.

Observability

You need to know when Claude calls are slow, expensive, rate limited, malformed, or producing poor results. At minimum, log the operation name, prompt version, model, latency, outcome, and any application correlation id. Do not log raw prompts or raw model responses unless you have a clear data policy and a safe storage location.

A practical log shape looks like this.

var started = TimeProvider.System.GetTimestamp();

try
{
    var message = await client.Messages.Create(parameters);

    var elapsed = TimeProvider.System.GetElapsedTime(started);

    logger.LogInformation(
        "Claude call completed. Operation={Operation} Model={Model} PromptVersion={PromptVersion} ElapsedMs={ElapsedMs}",
        "SummariseDocument",
        parameters.Model,
        SummaryPrompts.Version,
        elapsed.TotalMilliseconds);

    return message;
}
catch (Exception ex)
{
    var elapsed = TimeProvider.System.GetElapsedTime(started);

    logger.LogError(
        ex,
        "Claude call failed. Operation={Operation} Model={Model} PromptVersion={PromptVersion} ElapsedMs={ElapsedMs}",
        "SummariseDocument",
        parameters.Model,
        SummaryPrompts.Version,
        elapsed.TotalMilliseconds);

    throw;
}

If you use IChatClient, Microsoft.Extensions.AI can be layered with telemetry middleware. That makes it easier to standardise tracing and metrics across providers rather than treating every model SDK differently.

Caching

Caching can help, but it is easy to do badly. Cache deterministic responses where the same input, model, prompt version, and options should produce an equivalent answer. Do not blindly cache free-form user chat, sensitive content, or anything where the answer depends on live permissions or rapidly changing data.

A safe cache key needs more than the user prompt.

public static string BuildCacheKey(
    string operation,
    string promptVersion,
    string model,
    string inputHash)
{
    return $"ai:{operation}:{promptVersion}:{model}:{inputHash}";
}

The important part is the input hash. Do not use raw document text as the cache key. Hash the canonical input and include the prompt version and model, otherwise you will return stale results after changing the prompt.

using System.Security.Cryptography;
using System.Text;

public static string Sha256(string value)
{
    var bytes = SHA256.HashData(Encoding.UTF8.GetBytes(value));
    return Convert.ToHexString(bytes).ToLowerInvariant();
}

Caching is most useful for expensive document summaries, repeated classification, stable internal knowledge answers, and background enrichment jobs. It is less useful for open-ended chat where every turn changes the conversation.

Background processing

Not every Claude call belongs in a request-response API. If the work is slow, expensive, or part of a larger workflow, put it behind a queue and process it in the background. That gives you better retry control, dead-letter handling, status tracking, and user experience.

This shape is better for document ingestion, long summarisation, classification, extraction, and batch enrichment. The user gets a job id immediately. The worker calls Claude with proper retries. The status endpoint reports current state from your database.

Do not hide long AI jobs behind a single HTTP request and hope the connection survives.

Testing

Treat Claude as an external dependency. Most tests should not call the real API. Put the SDK behind an interface or use IChatClient, then test your application code against a fake implementation.

using Microsoft.Extensions.AI;

public sealed class FakeChatClient(string responseText) : IChatClient
{
    public Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        ChatResponse response = new(
            new ChatMessage(ChatRole.Assistant, responseText));

        return Task.FromResult(response);
    }

    public async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        [System.Runtime.CompilerServices.EnumeratorCancellation]
        CancellationToken cancellationToken = default)
    {
        yield return new ChatResponseUpdate(ChatRole.Assistant, responseText);
        await Task.CompletedTask;
    }

    public object? GetService(Type serviceType, object? serviceKey = null)
    {
        return null;
    }

    public void Dispose()
    {
    }
}

The exact constructor shape of ChatResponse and ChatResponseUpdate may change between Microsoft.Extensions.AI versions, so keep fake clients small and close to your tests. The design point is more important than the precise test helper. Your business tests should assert what your code does with a model answer, not whether Anthropic’s service is reachable.

For integration tests, run a small number of real Claude calls behind an explicit test category. Never run them accidentally in every CI build. They cost money, they can be rate limited, and they are slower than normal unit tests.

Security and safety

In production, never send data to Claude unless your product, customer agreement, and data policy allow it. That includes support tickets, personal data, contracts, financial records, logs, and source code.

Model output is untrusted. If Claude returns JSON, validate it. If Claude chooses a tool, check permissions before executing it. If Claude summarises a document, keep a link back to the source. If Claude suggests an action, decide whether a human must approve it.

For structured outputs, validate the response before storing or acting on it. A model can produce malformed JSON, partial data, or plausible but wrong values. Strong typing in the SDK helps with the API boundary, but it does not prove the model’s generated answer is correct.

When to use direct SDK access

Use AnthropicClient directly when you need Claude-specific features, raw response access, provider-specific request parameters, file APIs, message batches, or advanced response handling. Use IChatClient when you want application code to stay provider-neutral, when you want function invocation through Microsoft.Extensions.AI, when you want middleware-style composition, or when you want tests to avoid Anthropic-specific types. The mistake is picking one forever. A clean .NET codebase can support both. Keep the direct SDK in an infrastructure layer and expose the narrower application behaviour through services.

A sensible production structure

For a modular monolith or clean vertical slice approach, I would keep the Claude integration in infrastructure and expose use-case-specific services to features. Avoid a global AiService that does everything. It will become a dumping ground.

This keeps your feature code focused on the business task. The infrastructure code owns retries, model configuration, parsing, logging, and SDK details. When Anthropic changes the SDK, you update a small boundary rather than chasing SDK types across your entire application.