Building AI Chat Applications in .NET with Microsoft.Extensions.AI

What we gonna do?

In this article, we'll explore how to build a console-based AI chat application in .NET that leverages Microsoft.Extensions.AI for unified AI abstractions and OllamaSharp for local LLM integration. Think of it as building your own ChatGPT-like experience, but running entirely on your local machine with full control over the model.

The beauty of Microsoft.Extensions.AI? It's a unified abstraction layer that lets you swap AI providers seamlessly. Start with Ollama for local development, then switch to Azure OpenAI for production - all without changing your core application logic.

Why we gonna do?

Let's address the elephant in the room: Python dominates AI development. But that doesn't mean .NET developers should jump ship. Here's why staying in the .NET ecosystem for AI makes perfect sense:

Leverage your existing .NET expertise

Why learn a new language and ecosystem when you can use the tools you already master? .NET offers robust dependency injection, background services, configuration management, and hosting infrastructure - all battle-tested in enterprise environments. These aren't afterthoughts in .NET; they're first-class citizens.

Production-ready from day one

.NET applications are built for production. With built-in support for structured logging, health checks, telemetry, and dependency injection, you're not cobbling together a production deployment - you're using proven patterns that power Fortune 500 companies.

Provider flexibility without the pain

Microsoft.Extensions.AI provides a game-changing abstraction. Write your code once against the IChatClient interface, then swap providers at runtime. Start local with Ollama, move to Azure OpenAI for production, or experiment with OpenAI directly - all with minimal code changes. This is the power of proper abstraction design.

Type safety and modern C# features

Python's dynamic typing can lead to runtime surprises. C# gives you compile-time safety, async/await patterns that are clean and efficient, LINQ for data manipulation, and nullable reference types that catch bugs before they hit production.

How we gonna do?

Step 1: Set up your project and dependencies

First, create a new console application and add the required NuGet packages:


dotnet new console -n AIChat
cd AIChat
dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.Hosting
dotnet add package OllamaSharp
dotnet user-secrets init

The packages we're using:

Microsoft.Extensions.AI - Unified AI abstractions for chat, embeddings, and more
Microsoft.Extensions.Hosting - Generic Host for dependency injection and app lifecycle
OllamaSharp - Client library for Ollama, a tool for running LLMs locally

Your App.csproj should look like this:


<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net10.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <UserSecretsId>58bd4602-408b-4e27-a45e-2f09cacaeb3c</UserSecretsId>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.Extensions.AI" Version="10.1.1" />
    <PackageReference Include="Microsoft.Extensions.Hosting" Version="10.0.2" />
    <PackageReference Include="OllamaSharp" Version="5.4.12" />
  </ItemGroup>

</Project>

Step 2: Configure user secrets for secure configuration

Never hardcode endpoints or API keys. Use user secrets for local development:


dotnet user-secrets set "Chat:Ollama:Endpoint" "http://localhost:11434"

This stores your configuration securely outside of source control. For production, use Azure Key Vault or environment variables.

Step 3: Set up the host and dependency injection

The Program.cs file is where we configure our application host, set up dependency injection, and register our services:


using App;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using OllamaSharp;

var host = Host.CreateApplicationBuilder(args);

// Configure the host to read user secrets
host.Configuration.AddUserSecrets<Program>();

// Get the Ollama endpoint from configuration
var endpoint = host.Configuration["Chat:Ollama:Endpoint"] 
    ?? throw new InvalidOperationException(
        "Missing configuration: Endpoint. See the README for details.");

var model = "llama3.1";

// Create the Ollama client
var client = new OllamaApiClient(endpoint)
{
    SelectedModel = model
};

// Register the chat client with DI
IChatClient innerClient = client;
host.Services.AddChatClient(innerClient).UseLogging();

// Register our background service
host.Services.AddHostedService<ChatApp>();

var app = host.Build();

// Run the app
await app.RunAsync();

Notice how we're using the Generic Host pattern. This gives us:

Configuration - Load settings from user secrets, environment variables, or appsettings.json
Dependency Injection - Register and resolve services cleanly
Logging - Built-in structured logging with .UseLogging()
Lifecycle Management - Graceful startup and shutdown

Step 4: Create the chat application as a background service

The BackgroundService base class provides lifecycle hooks for long-running operations. Here's our complete chat implementation:


using Microsoft.Extensions.AI;
using Microsoft.Extensions.Hosting;

namespace App;

internal class ChatApp(
    IChatClient ai, 
    IHostApplicationLifetime lifetime) : BackgroundService
{
    private static bool exitRequested = false;
    readonly List<ChatMessage> history = [];
    
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        // Handle Ctrl+C gracefully
        Console.CancelKeyPress += (sender, e) =>
        {
            Console.WriteLine("\nCtrl+C detected. Exiting gracefully...");
            e.Cancel = true;
            lifetime.StopApplication();
            exitRequested = true;
        };

        // Initialize conversation with system message
        ChatMessage systemMessage = new(
            ChatRole.System, 
            "You are an AI assistant that tries to answer the user's query.");
        history.Add(systemMessage);
        
        // Get initial greeting
        ChatResponse response = await ai.GetResponseAsync(
            history, 
            cancellationToken: stoppingToken);
        Console.WriteLine("AI: " + (string.IsNullOrWhiteSpace(response.Text) 
            ? "How can I assist you today?" 
            : response.Text));
        
        // Main conversation loop
        while (stoppingToken.IsCancellationRequested == false)
        {
            Console.Write("Prompt > ");
            string? userMessage = Console.ReadLine();
            
            if (userMessage == null || exitRequested)
                break;
            
            // Add user message to history
            history.Add(new ChatMessage(ChatRole.User, userMessage));
            
            // Stream the response for better UX
            var responseText = new TextContent("");
            var currentResponseMessage = new ChatMessage(
                ChatRole.Assistant, 
                [responseText]);
            
            await foreach (var chunk in ai.GetStreamingResponseAsync(
                history, 
                cancellationToken: stoppingToken))
            {
                history.AddMessages(
                    chunk, 
                    filter: c => c is not TextContent);
                responseText.Text += chunk.Text;
                Console.Write(chunk.Text);
            }
            
            history.Add(currentResponseMessage);
            Console.WriteLine();
        }
    }
}

Let's break down what makes this implementation powerful:

Conversation history management

The history list maintains the entire conversation context. This is crucial for the AI to understand follow-up questions and maintain coherent conversations. Each message includes a role (System, User, or Assistant) and the content.

Streaming responses for better UX

Instead of waiting for the complete response, we use GetStreamingResponseAsync to display tokens as they're generated. This creates a ChatGPT-like experience where users see the response appearing in real-time rather than waiting for the entire answer.


await foreach (var chunk in ai.GetStreamingResponseAsync(
    history, 
    cancellationToken: stoppingToken))
{
    Console.Write(chunk.Text);
}

Graceful shutdown handling

The Console.CancelKeyPress event handler ensures that when users press Ctrl+C, the application shuts down gracefully without leaving resources hanging. This is production-grade error handling.

Step 5: Run your AI chat application

Before running the application, make sure you have Ollama installed and running locally. Download it from ollama.ai and pull the model:


ollama pull llama3.1

Now run your .NET application:


dotnet run

You'll see a greeting from the AI, and you can start chatting:


AI: How can I assist you today?
Prompt > What is dependency injection?
AI: Dependency injection is a design pattern...
Prompt > Give me an example in C#
AI: Here's a practical example...

Step 6: The power of abstraction - swapping to Azure OpenAI

Here's where Microsoft.Extensions.AI truly shines. Want to use Azure OpenAI instead of Ollama? Just swap the client implementation:


// Instead of OllamaSharp:
// var client = new OllamaApiClient(endpoint)
// {
//     SelectedModel = model
// };

// Use Azure OpenAI:
var client = new AzureOpenAIClient(
    new Uri(endpoint),
    new AzureKeyCredential(apikey)
);

IChatClient innerClient = client.AsChatClient("gpt-4o");
host.Services.AddChatClient(innerClient).UseLogging();

That's it. Your ChatApp class doesn't change at all. The IChatClient abstraction handles the differences between providers. This is the true power of interface-based design - write once, deploy anywhere.

Production considerations

For production deployments, consider these enhancements:

Rate limiting - Add throttling to prevent API abuse
Token counting - Monitor usage and costs with UsageDetails
Error handling - Implement retry policies for transient failures
Conversation persistence - Store history in a database for multi-session support
Structured logging - Use ILogger for observability in production
Configuration management - Use Azure App Configuration or Key Vault

Summary

In this article, we've shattered the myth that AI development requires Python. We built a production-ready AI chat application in .NET using Microsoft.Extensions.AI and OllamaSharp, leveraging familiar .NET patterns like Generic Host, Dependency Injection, Background Services, and User Secrets.

The key takeaway? Microsoft.Extensions.AI provides a unified abstraction that lets you write your AI logic once and swap providers effortlessly. Start local with Ollama for development, then move to Azure OpenAI for production without rewriting your application logic. This is enterprise-grade AI development with the type safety, tooling, and ecosystem you already know.

.NET developers no longer need to compromise. You can build sophisticated AI applications using the same robust patterns that power mission-critical enterprise systems - all while staying in the ecosystem you've mastered.

👉🏼 Click here to Join I ❤️ .NET WhatsApp Channel to get 🔔 notified about new articles and other updates.

Table of Contents