Microsoft.Extensions.AI Explained

A lot of .NET AI code starts the same way. You install a provider SDK, create a client, pass in a prompt and get a response back. For a prototype, thats fine. For a production application, it can turn messy quicker than expected.

The issue is not that provider SDKs are bad. The issue is where they end up sitting in your application. If your application services depend directly on OpenAI, Azure OpenAI, Ollama or another provider, that provider starts to become part of your application design. Your tests know about it. Your streaming code knows about it. Your telemetry code knows about it. Your tool-calling code knows about it. Then somebody asks if you can swap provider, run locally, use Azure in production, or support a second model, and suddenly the simple code is not so simple.

Thats the problem Microsoft.Extensions.AI is trying to solve. It gives .NET developers a common abstraction for AI features. It does not remove the need for providers. It does not replace good architecture. It does not magically make an AI feature production-ready. What it does is give you a cleaner boundary, so the rest of your application is not built around one provider SDK.

The better question is not, should I use OpenAI, Azure OpenAI, Ollama, Semantic Kernel or Agent Framework? The better question is, where should the provider-specific code live? For a lot of normal .NET applications, Microsoft.Extensions.AI is a good answer.

The problem with using provider SDKs directly

Provider SDKs are usually the fastest way to get started. You create the client, call the model and return the answer. There is nothing wrong with that in a small spike. The problem starts when that code becomes the foundation for the real application. Your service layer starts accepting provider-specific request types. Your controller returns provider-specific response types. Your tests need to fake a concrete SDK. Your retries, logging, caching and telemetry get written around one provider. Your streaming code gets tied to one response shape. That creates friction. It also makes the architecture harder to explain. Is your application an order system with an AI feature, or is it an OpenAI wrapper with some business logic around it? That sounds like a small distinction, but it changes how you structure the code. The application should own the use case. The AI provider should be an implementation detail. Microsoft.Extensions.AI helps you keep that line cleaner.

https://www.youtube.com/watch?v=zrPtp00aUX0

What Microsoft.Extensions.AI actually is

Microsoft.Extensions.AI is a set of .NET libraries for working with AI services through common abstractions. The two main abstractions most developers will notice first are IChatClient and IEmbeddingGenerator<TInput, TEmbedding>. IChatClient represents a chat client. It can send messages to an AI service and receive either a full response or streamed updates. It supports multi-modal content as well, so it is not limited to simple text-only prompts. IEmbeddingGenerator<TInput, TEmbedding> represents an embedding generator. You use it when you need vector embeddings for search, similarity matching, RAG pipelines or other semantic features. The package split is worth understanding. Microsoft.Extensions.AI.Abstractions contains the core exchange types and abstractions. This is the package library authors usually target when they do not want to force a specific provider on consumers. Microsoft.Extensions.AI builds on those abstractions and adds useful application features such as middleware-style pipelines, function invocation, caching, logging and telemetry.

That fits the way .NET developers already build applications. It feels closer to HttpClientFactory, dependency injection, logging and middleware than a separate AI framework bolted onto the side.

A simple chat example

The simplest useful example is an IChatClient backed by OpenAI.

dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.AI.OpenAI

Then you can create an IChatClient from the OpenAI chat client.

using Microsoft.Extensions.AI;

IChatClient client =
    new OpenAI.Chat.ChatClient(
        "gpt-4o-mini",
        Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
    .AsIChatClient();

var response = await client.GetResponseAsync("Explain dependency injection in one paragraph.");

Console.WriteLine(response.Text);

There is nothing dramatic there. That is the point.

The rest of your code can depend on IChatClient, not directly on OpenAI.Chat.ChatClient. That gives you a better seam.

Use dependency injection properly

In a real ASP.NET Core app, you probably do not want random classes creating AI clients directly. Register the client once and inject the abstraction where you need it.

using Microsoft.Extensions.AI;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddChatClient(_ =>
    new OpenAI.Chat.ChatClient(
        "gpt-4o-mini",
        builder.Configuration["OPENAI_API_KEY"])
    .AsIChatClient());

var app = builder.Build();

app.MapPost("/summaries", async (
    SummaryRequest request,
    IChatClient chatClient,
    CancellationToken stopToken) =>
{
    var prompt = $"""
    Summarise the following text in plain English.

    Text:
    {request.Text}
    """;

    var response = await chatClient.GetResponseAsync(prompt, cancellationToken: stopToken);

    return Results.Ok(new SummaryResponse(response.Text));
});

app.Run();

public sealed record SummaryRequest(string Text);

public sealed record SummaryResponse(string Summary);

This is already a better shape than creating the provider client inside the endpoint. But I would usually go one step further. The endpoint should not know how the prompt is built. It should call an application service that owns the use case.

public sealed class SummaryService(IChatClient chatClient)
{
    public async Task<string> SummariseAsync(string text, CancellationToken stopToken)
    {
        var prompt = $"""
        You are helping summarise internal support notes.

        Return a short summary in plain English.
        Do not invent details.

        Notes:
        {text}
        """;

        var response = await chatClient.GetResponseAsync(prompt, cancellationToken: stopToken);

        return response.Text;
    }
}

That keeps your endpoint boring, which is usually a good sign.

app.MapPost("/summaries", async (
    SummaryRequest request,
    SummaryService summaryService,
    CancellationToken stopToken) =>
{
    var summary = await summaryService.SummariseAsync(request.Text, stopToken);

    return Results.Ok(new SummaryResponse(summary));
});

The API layer handles HTTP. The service owns the use case. The AI client is just a dependency. That is the cleaner boundary.

Streaming is part of the abstraction

AI responses are often streamed. You dont want every provider to push you into a completely different streaming model. IChatClient supports streaming through GetStreamingResponseAsync.

app.MapPost("/chat/stream", async (
    ChatRequest request,
    IChatClient chatClient,
    HttpResponse response,
    CancellationToken stopToken) =>
{
    response.Headers.ContentType = "text/event-stream";

    await foreach (var update in chatClient.GetStreamingResponseAsync(
        request.Message,
        cancellationToken: stopToken))
    {
        await response.WriteAsync($"data: {update.Text}\n\n", stopToken);
        await response.Body.FlushAsync(stopToken);
    }
});

public sealed record ChatRequest(string Message);

You still need to think about cancellation, client disconnects, rate limits and error handling. The abstraction does not remove those problems. But it does give your application a consistent streaming shape. That makes the code easier to move between providers and easier to test.

Tool calling without turning everything into an agent

Tool calling is where a lot of AI demos become messy. The model does not directly execute your code. The model asks your application to call a tool with certain arguments. Your application performs the operation, returns the result to the model, and the model uses that result to complete the answer. Microsoft.Extensions.AI gives you provider-agnostic tool-calling abstractions. You can expose .NET methods as AI functions and let the chat client handle the invocation pipeline.

using System.ComponentModel;
using Microsoft.Extensions.AI;

IChatClient openAiClient =
    new OpenAI.Chat.ChatClient(
        "gpt-4o-mini",
        Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
    .AsIChatClient();

IChatClient client = new ChatClientBuilder(openAiClient)
    .UseFunctionInvocation()
    .Build();

var options = new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetOrderStatus)]
};

var response = await client.GetResponseAsync(
    "What is the status of order ORD-123?",
    options);

Console.WriteLine(response.Text);

[Description("Gets the current status of an order.")]
static string GetOrderStatus(string orderNumber)
{
    return orderNumber switch
    {
        "ORD-123" => "The order is being packed.",
        _ => "The order was not found."
    };
}

This is useful, but it needs discipline. A tool is an application boundary, not a free-for-all. Do not expose dangerous operations just because the model can call functions. Do not let the model choose from methods that change money, permissions, customer data or security state without strong validation and explicit guardrails. For read-only internal lookups, tool calling can be a clean pattern. For write operations, approvals and deterministic business rules still need to sit in your application. The model can assist. It should not own the rule.

Embeddings fit the same model

Chat gets most of the attention, but embeddings are just as important in real AI systems. If you are building search, semantic matching, document classification, duplicate detection or RAG, you usually need embeddings. Microsoft.Extensions.AI gives you IEmbeddingGenerator<TInput, TEmbedding> for that.

using Microsoft.Extensions.AI;

IEmbeddingGenerator<string, Embedding<float>> generator =
    new OpenAI.Embeddings.EmbeddingClient(
        "text-embedding-3-small",
        Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
    .AsIEmbeddingGenerator();

var embeddings = await generator.GenerateAsync("How do I reset my password?");

ReadOnlyMemory<float> vector = embeddings[0].Vector;

Again, the useful part is the boundary. Your search indexing code can depend on an embedding generator abstraction. It does not need to know whether the embedding comes from OpenAI, Azure OpenAI, Ollama or another implementation. That becomes useful when you start separating local development, test environments and production. You can keep the application shape consistent even when the underlying model changes.

Caching belongs in the pipeline

AI calls can be expensive and slow compared with normal application calls. Not every response should be cached, but some can be. Embeddings are a good example. If the same text needs the same embedding, caching can save both time and cost. Some prompt responses may also be cacheable, especially if they are deterministic, low-risk and based on stable input.

Microsoft.Extensions.AI supports caching through delegating implementations.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;

IDistributedCache cache = new MemoryDistributedCache(
    Options.Create(new MemoryDistributedCacheOptions()));

IChatClient openAiClient =
    new OpenAI.Chat.ChatClient(
        "gpt-4o-mini",
        Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
    .AsIChatClient();

IChatClient client = new ChatClientBuilder(openAiClient)
    .UseDistributedCache(cache)
    .Build();

In a real system, you would usually use a proper distributed cache, not memory cache, if the app runs across multiple instances. You also need to be careful about what you cache. Do not casually cache sensitive prompts or user-specific data without thinking about retention, privacy and tenant boundaries. Caching is not just a performance setting. It is part of the application design.

Telemetry should not be an afterthought

AI features need observability. You need to know how often prompts run, which operations are slow, how often calls fail, whether tool calls are being invoked, and where token usage is going. You also need to avoid dumping sensitive prompt data into logs or traces without thinking about it. Microsoft.Extensions.AI supports logging and OpenTelemetry-style instrumentation in the chat client pipeline.

using Microsoft.Extensions.AI;
using OpenTelemetry.Trace;

var sourceName = "DotNetDigest.AI";

using var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();

IChatClient openAiClient =
    new OpenAI.Chat.ChatClient(
        "gpt-4o-mini",
        Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
    .AsIChatClient();

IChatClient client = new ChatClientBuilder(openAiClient)
    .UseOpenTelemetry(sourceName: sourceName)
    .Build();

The useful idea is the pipeline. You can wrap the AI client with telemetry, caching, logging, rate limiting or your own custom middleware without scattering that code across every use case. Thats closer to how mature .NET applications are normally built.

Where Semantic Kernel fits

Microsoft.Extensions.AI is not the same thing as Semantic Kernel. Semantic Kernel is a higher-level orchestration framework. It gives you more structure for plugins, planners, prompt templates, memory and more advanced AI workflows. If you are building complex orchestration around AI capabilities, Semantic Kernel may still make sense. Microsoft.Extensions.AI is lower level. It gives you common abstractions and pipeline pieces for AI clients. For many application features, that is enough. If your feature is summarise this text, classify this document, generate embeddings, call a small number of tools or stream a chat response, I would start with Microsoft.Extensions.AI.

If your feature is a larger AI workflow with planning, multiple steps, reusable semantic functions and more orchestration, then I would look at Semantic Kernel or Microsoft Agent Framework. The mistake is reaching for the bigger framework before you know you need it.

Where Agent Framework fits

Microsoft Agent Framework sits higher again.

It is aimed at agents and multi-agent workflows. If your application needs long-running agentic behaviour, multi-agent orchestration, graph-style workflows, or richer tool coordination, that is a different problem from calling a chat model behind an application service. Do not turn every AI feature into an agent.

Most business applications do not need that as the first step. They need a safe, testable, observable way to call a model from a normal application flow. That is where Microsoft.Extensions.AI fits nicely.

Where provider SDKs still fit

Provider SDKs still matter.

Microsoft.Extensions.AI does not remove the provider. It wraps or adapts the provider behind a common abstraction. You still need a concrete implementation somewhere. You still need to understand the provider’s model names, limits, authentication, pricing, regional availability and feature support.

There will also be cases where the provider-specific SDK exposes a feature that the common abstraction does not cover yet. Thats fine. The key is to keep provider-specific code close to the edge. Put it in infrastructure, composition root or a provider adapter. Do not let it leak through your application services unless there is a good reason.

Use the abstraction for the common path. Drop down to the provider SDK only when the feature genuinely needs it.

What I would use it for

I would use Microsoft.Extensions.AI for normal application AI features. Summarisation is a good fit. Classification is a good fit. Basic chat is a good fit. Streaming chat is a good fit. Embedding generation is a good fit. Tool calling for controlled read-only operations is a good fit. RAG pipeline components can also use it, especially when you want provider-neutral chat and embedding boundaries.

I would not assume it is enough for every AI system. If you are building a complex agent platform, you will probably need more. If you are relying on a provider-specific feature, you may need the provider SDK directly. If your problem is document search, you still need vector storage and retrieval. If your problem is orchestration, you still need workflow design. If your problem is safety, you still need guardrails.

Microsoft.Extensions.AI gives you the AI client boundary. It does not give you the whole architecture.

A better production shape

A good production shape is fairly simple. Your API endpoint should call an application service. The application service should express the use case. The AI dependency should be represented by IChatClient or IEmbeddingGenerator. Provider configuration should live in the composition root. Cross-cutting behaviour such as logging, caching, telemetry and function invocation should be configured in the client pipeline.

That gives you clean edges. The endpoint does not know about OpenAI. The service does not create clients. Tests can replace the AI client with a fake. Local development can use a local provider. Production can use Azure OpenAI. Telemetry can be added consistently. That is the kind of boring structure you want around an unpredictable dependency. The AI part is already nondeterministic enough. The application architecture should not add more chaos.

Testing becomes easier

Testing AI features is awkward when everything depends on concrete provider clients. With an abstraction, you can test your application logic without making real model calls.

public sealed class FakeChatClient(string responseText) : IChatClient
{
    public Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        return Task.FromResult(new ChatResponse(
            new ChatMessage(ChatRole.Assistant, responseText)));
    }

    public async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        yield return new ChatResponseUpdate(ChatRole.Assistant, responseText);
        await Task.CompletedTask;
    }

    public object? GetService(Type serviceType, object? serviceKey = null) => null;

    public void Dispose()
    {
    }
}

You do not need this exact fake in every project. The broader point is that your test can control the AI response without calling the real service. That lets you test what your application does with a model response. You can test validation, mapping, fallback behaviour, persistence and error handling. You still need separate evaluation for prompt quality and model behaviour. Unit tests do not prove the model is good. They prove your application handles the response path correctly. That distinction is easy to miss.

The real decision

So should you use Microsoft.Extensions.AI? For most new .NET AI features, yes, I would start there. It gives you the cleanest default boundary between your application and the AI provider. It supports the normal things you need first, chat, streaming, embeddings, tool calling, caching, logging and telemetry. It also fits the .NET hosting and dependency injection model instead of making the AI feature feel separate from the rest of the app. But I would not oversell it.

It is not a full agent framework. It is not a replacement for architecture. It is not a safety layer by itself. It is not a RAG system by itself. It will not decide your prompt strategy, your validation rules, your cost controls or your human review process. It is the right abstraction layer for a lot of application code. Thats enough.

What should you actually use?

If you are building a simple .NET AI feature, start with Microsoft.Extensions.AI. Use IChatClient for chat and text generation. Use IEmbeddingGenerator for embeddings. Register them through dependency injection. Keep provider setup at the edge. Add telemetry and logging early. Use caching carefully. Treat tool calling as an application boundary. Keep provider-specific SDK usage contained.

If the feature grows into a bigger workflow, then look at Semantic Kernel or Microsoft Agent Framework. If you need a provider-only feature, drop down to the provider SDK in a controlled place.

The practical default is this, Use Microsoft.Extensions.AI as the application-facing abstraction. Use provider SDKs as implementation details. Use bigger frameworks only when the workflow needs them. That is a solid shape for modern .NET AI applications.

https://learn.microsoft.com/en-us/dotnet/ai/microsoft-extensions-ai

https://learn.microsoft.com/en-us/dotnet/ai/ichatclient

https://learn.microsoft.com/en-us/dotnet/ai/conceptual/calling-tools

https://devblogs.microsoft.com/dotnet/dotnet-ai-essentials-the-core-building-blocks-explained/

https://devblogs.microsoft.com/dotnet/ai-vector-data-dotnet-extensions-ga/

https://www.nuget.org/packages/Microsoft.Extensions.AI/

https://www.nuget.org/packages/Microsoft.Extensions.AI.OpenAI