How to make use of the new TurboVec from .NET

TurboVec is interesting because it attacks one of the problems that appears after a RAG system starts to grow. Embeddings are easy to talk about when you have a few thousand chunks. They become much harder to ignore when you have millions of them, each with hundreds or thousands of dimensions, all sitting in memory waiting to be searched. The usual .NET answer is to put a vector database beside the application and call it over HTTP. Thats a reasonable default. Use PostgreSQL with pgvector, Azure AI Search, or whatever already fits. The application stays in C#, the vector store does vector search, and nobody has to explain to the team why a Rust crate has appeared in the middle of the API.

TurboVec changes the question slightly. Its a Rust vector index built on TurboQuant, with Python bindings already available, but the Rust crate is the interesting part for a .NET team. If you want to use it from .NET, the cleanest approach is to treat TurboVec as a small retrieval service written in Rust. Your .NET API calls that service over HTTP or gRPC. The Rust service owns the compressed vector index. The .NET application keeps ownership of authentication, authorisation, business rules, metadata, prompt orchestration and the LLM call. That gives you a sane boundary. You get the performance and memory benefits of a Rust vector index without forcing Rust, Python or native interop into the core of your .NET API.

The shape of the integration

I wouldnt start by trying to load TurboVec directly inside a .NET process. You could probably build a native library around the Rust crate and call it with P/Invoke, but that is a sharp tool. You now own platform-specific builds, memory ownership, native crashes, and a much more awkward debugging story. A separate Rust service is probably the right way. The .NET API sends an embedding to the retrieval service. The retrieval service searches TurboVec and returns document IDs with scores. The .NET API then loads the matching chunks from its normal data store, applies any final business rules, builds the prompt and calls the model.

The flow looks like this.

The important design decision is that TurboVec should return IDs, not become your source of truth. Your database still owns everything. TurboVec owns fast similarity search over vectors. That separation will save you later on.

Why Rust rather than Python?

TurboVec already has Python bindings, and Python is fine for experiments. If you are building a production .NET system though, I would favour Rust for the retrieval service. The reason is deployment shape. A Rust service can compile into a single small binary. It has predictable memory behaviour. It avoids a Python runtime in your production path. It keeps you close to the TurboVec crate itself. It also makes the service feel like infrastructure rather than a notebook that escaped into production. Your .NET team does not need to become a Rust team overnight. The Rust surface area can stay small. One service. Three endpoints. One index. A thin contract. Thats manageable.

There is also a wider industry shift around systems code. Rust is being used more often where memory safety, predictable performance and low-level control are important. Microsoft has also been public about improving memory safety across its stack through safer C# work and increased Rust adoption in lower-level areas. For .NET people, the takeaway is practical rather than dramatic. C# remains the application language. Rust is becoming a sensible choice for small, specialised infrastructure components that sit beside the application, which is exactly the shape a TurboVec retrieval service uses.

The contract between .NET and Rust

Start with a simple HTTP contract. You can move to gRPC later if the payload size, latency or throughput justify it. HTTP with JSON is easier to debug, easier to test with curl, and easier for most .NET teams to wire into an existing system. A practical first contract is an add endpoint, a search endpoint and a health endpoint.

{
  "id": 1001,
  "vector": [0.012, -0.031, 0.044]
}

{
  "vector": [0.012, -0.031, 0.044],
  "k": 10,
  "allowList": [1001, 1002, 1003]
}

{
  "results": [
    { "id": 1001, "score": 0.91 },
    { "id": 1003, "score": 0.87 }
  ]
}

Use numeric IDs at the TurboVec boundary. TurboVec has an IdMapIndex type for stable external u64 IDs, which is the right fit for a backend system. If your document IDs are GUIDs or strings, keep a mapping in your database. Do not force the vector index to understand your whole domain model. For example, your SQL table might have a normal document chunk ID and a separate numeric vector ID.

CREATE TABLE DocumentChunks
(
    Id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY,
    TenantId UNIQUEIDENTIFIER NOT NULL,
    VectorId BIGINT NOT NULL UNIQUE,
    Content NVARCHAR(MAX) NOT NULL,
    SourceDocumentId UNIQUEIDENTIFIER NOT NULL,
    CreatedUtc DATETIME2 NOT NULL
);

The .NET API can use VectorId when talking to TurboVec, then use the normal Id when working inside the application.

Building the Rust TurboVec service

The Rust service can be small. The exact crate versions will move, so the simplest setup is to let Cargo add the current versions.

cargo new turbovec-search
cd turbovec-search

cargo add turbovec
cargo add axum
cargo add tokio --features full
cargo add serde --features derive
cargo add serde_json
cargo add tracing
cargo add tracing-subscriber

The service needs to hold an index in memory. Search calls only need shared read access. Add and remove operations need write access. A simple first version can use Arc<RwLock<IdMapIndex>>.

This is enough to show the shape.

use axum::{
    extract::State,
    http::StatusCode,
    routing::{get, post},
    Json, Router,
};
use serde::{Deserialize, Serialize};
use std::{path::Path, sync::Arc};
use tokio::sync::RwLock;
use turbovec::IdMapIndex;

#[derive(Clone)]
struct AppState {
    index: Arc<RwLock<IdMapIndex>>,
    index_path: String,
    dim: usize,
}

#[derive(Deserialize)]
struct AddVectorRequest {
    id: u64,
    vector: Vec<f32>,
}

#[derive(Serialize)]
struct AddVectorResponse {
    accepted: bool,
}

#[derive(Deserialize)]
struct SearchRequest {
    vector: Vec<f32>,
    k: usize,
    #[serde(rename = "allowList")]
    allow_list: Option<Vec<u64>>,
}

#[derive(Serialize)]
struct SearchResult {
    id: u64,
    score: f32,
}

#[derive(Serialize)]
struct SearchResponse {
    results: Vec<SearchResult>,
}

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init();

    let dim = 1536;
    let bit_width = 4;
    let index_path = "data/index.tvim".to_string();

    let index = if Path::new(&index_path).exists() {
        let loaded = IdMapIndex::load(&index_path)
            .expect("failed to load TurboVec index");

        loaded.prepare();
        loaded
    } else {
        IdMapIndex::new(dim, bit_width)
            .expect("failed to create TurboVec index")
    };

    let state = AppState {
        index: Arc::new(RwLock::new(index)),
        index_path,
        dim,
    };

    let app = Router::new()
        .route("/health", get(health))
        .route("/vectors", post(add_vector))
        .route("/search", post(search))
        .with_state(state);

    let listener = tokio::net::TcpListener::bind("0.0.0.0:8080")
        .await
        .expect("failed to bind listener");

    axum::serve(listener, app)
        .await
        .expect("server failed");
}

async fn health() -> StatusCode {
    StatusCode::OK
}

async fn add_vector(
    State(state): State<AppState>,
    Json(request): Json<AddVectorRequest>,
) -> Result<Json<AddVectorResponse>, StatusCode> {
    if request.vector.len() != state.dim {
        return Err(StatusCode::BAD_REQUEST);
    }

    let mut index = state.index.write().await;

    index
        .add_with_ids(&request.vector, &[request.id])
        .map_err(|_| StatusCode::BAD_REQUEST)?;

    index
        .write(&state.index_path)
        .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;

    Ok(Json(AddVectorResponse { accepted: true }))
}

async fn search(
    State(state): State<AppState>,
    Json(request): Json<SearchRequest>,
) -> Result<Json<SearchResponse>, StatusCode> {
    if request.vector.len() != state.dim || request.k == 0 {
        return Err(StatusCode::BAD_REQUEST);
    }

    let index = state.index.read().await;

    let results = match request.allow_list {
        Some(allow_list) => {
            let filtered_allow_list = allow_list
                .into_iter()
                .filter(|id| index.contains(*id))
                .collect::<Vec<u64>>();

            if filtered_allow_list.is_empty() {
                return Ok(Json(SearchResponse { results: Vec::new() }));
            }

            let (scores, ids) =
                index.search_with_allowlist(&request.vector, request.k, Some(&filtered_allow_list));

            ids.into_iter()
                .zip(scores.into_iter())
                .map(|(id, score)| SearchResult { id, score })
                .collect()
        }
        None => {
            let (scores, ids) = index.search(&request.vector, request.k);

            ids.into_iter()
                .zip(scores.into_iter())
                .map(|(id, score)| SearchResult { id, score })
                .collect()
        }
    };

    Ok(Json(SearchResponse { results }))
}

This is deliberately small. Its enough to prove the integration and test the boundaries. Its not the final version I would ship under heavy load.

The first production change I would make is around persistence. Writing the index to disk on every add is easy to understand, but it is not a good strategy for high ingest. In a real system, Id persist source documents and embeddings in the database or object storage, append changes to a queue, update the in-memory index from a worker, and snapshot the TurboVec index on a controlled interval.

For a first internal RAG service though, this gets you moving.

Calling the Rust service from .NET

On the .NET side, hide TurboVec behind an interface. The rest of the application should not know whether the retrieval service is TurboVec, pgvector, Qdrant, Azure AI Search or something else.

public sealed record VectorSearchRequest(
    IReadOnlyList<float> Vector,
    int K,
    IReadOnlyList<ulong>? AllowList);

public sealed record VectorSearchResult(
    ulong Id,
    float Score);

public interface IVectorSearchClient
{
    Task<IReadOnlyList<VectorSearchResult>> SearchAsync(
        VectorSearchRequest request,
        CancellationToken stopToken);
}

Then create a typed HTTP client.

using System.Net.Http.Json;
using Microsoft.Extensions.Options;

public sealed class TurbovecOptions
{
    public required string BaseUrl { get; init; }
}

public sealed class TurbovecVectorSearchClient(
    HttpClient httpClient) : IVectorSearchClient
{
    public async Task<IReadOnlyList<VectorSearchResult>> SearchAsync(
        VectorSearchRequest request,
        CancellationToken stopToken)
    {
        using var response = await httpClient.PostAsJsonAsync(
            "/search",
            request,
            stopToken);

        response.EnsureSuccessStatusCode();

        var payload = await response.Content.ReadFromJsonAsync<SearchResponse>(
            cancellationToken: stopToken);

        return payload?.Results ?? [];
    }

    private sealed record SearchResponse(
        IReadOnlyList<VectorSearchResult> Results);
}

builder.Services.Configure<TurbovecOptions>(
    builder.Configuration.GetSection("Turbovec"));

builder.Services.AddHttpClient<IVectorSearchClient, TurbovecVectorSearchClient>(
    (services, client) =>
    {
        var options = services
            .GetRequiredService<IOptions<TurbovecOptions>>()
            .Value;

        client.BaseAddress = new Uri(options.BaseUrl);
    });

Your configuration stays simple.

{
  "Turbovec": {
    "BaseUrl": "http://turbovec-search:8080"
  }
}

Now the rest of the application talks to IVectorSearchClient. That is the part worth protecting. Once you have that boundary, TurboVec is just one adapter.

Where the allow list should come from

The allow list is the part that makes this feel like a proper backend design rather than a vector search demo. In most business systems, the user should not search every document in the index. They should search the documents their tenant, role, case, claim, account or workspace allows them to see. That filtering should usually come from your existing database. The .NET API already understands the current user and the current tenant. It can ask the database for the allowed vector IDs, then pass those IDs to the Rust service.

app.MapPost("/rag/search", async (
    RagSearchRequest request,
    IEmbeddingClient embeddingClient,
    IDocumentPermissionRepository permissions,
    IDocumentChunkRepository chunks,
    IVectorSearchClient vectorSearch,
    IUserContext userContext,
    CancellationToken stopToken) =>
{
    var embedding = await embeddingClient.CreateEmbeddingAsync(
        request.Query,
        stopToken);

    var allowedVectorIds = await permissions.GetAllowedVectorIdsAsync(
        userContext.UserId,
        userContext.TenantId,
        stopToken);

    var matches = await vectorSearch.SearchAsync(
        new VectorSearchRequest(
            Vector: embedding,
            K: 10,
            AllowList: allowedVectorIds),
        stopToken);

    var chunkIds = matches
        .Select(match => match.Id)
        .ToArray();

    var matchedChunks = await chunks.GetByVectorIdsAsync(
        chunkIds,
        stopToken);

    return Results.Ok(matchedChunks);
});

This is a good split of responsibility. SQL handles structured permissions. TurboVec handles similarity search. The .NET API composes the result. You do not want permission logic hidden inside the vector index. You also do not want to retrieve top 100 results and then throw away 95 of them because the user cannot access them. Passing an allow list into the search step is cleaner and more predictable.

Adding vectors from .NET

Search is only half the story. You also need to index documents.

public sealed record AddVectorRequest(
    ulong Id,
    IReadOnlyList<float> Vector);

public interface IVectorIndexClient
{
    Task AddAsync(
        AddVectorRequest request,
        CancellationToken stopToken);
}

public sealed class TurbovecVectorIndexClient(
    HttpClient httpClient) : IVectorIndexClient
{
    public async Task AddAsync(
        AddVectorRequest request,
        CancellationToken stopToken)
    {
        using var response = await httpClient.PostAsJsonAsync(
            "/vectors",
            request,
            stopToken);

        response.EnsureSuccessStatusCode();
    }
}

Id normally call this from an indexing worker, not directly from the user-facing request path. Uploading a document, extracting text, chunking it, embedding each chunk and updating the vector index can be slow. Push that work behind a queue.

A simple flow is usually enough.

This keeps the upload request fast. It also gives you somewhere to retry if embedding generation fails or the Rust retrieval service is temporarily unavailable.

Dockerising the Rust service

You can containerise the Rust service and run it beside the .NET API. A basic Dockerfile can use a Rust build image and a small Debian runtime image.

FROM rust:1-bookworm AS build
WORKDIR /app

COPY Cargo.toml Cargo.lock ./
COPY src ./src

RUN cargo build --release

FROM debian:bookworm-slim AS runtime
WORKDIR /app

RUN mkdir -p /app/data

COPY --from=build /app/target/release/turbovec-search /app/turbovec-search

EXPOSE 8080

ENTRYPOINT ["/app/turbovec-search"]

In Docker Compose, the .NET API can reach the Rust service by service name.

services:
  api:
    build:
      context: ./src/MyApp.Api
    environment:
      Turbovec__BaseUrl: http://turbovec-search:8080
    depends_on:
      - turbovec-search

  turbovec-search:
    build:
      context: ./src/turbovec-search
    ports:
      - "8080:8080"
    volumes:
      - turbovec-data:/app/data

volumes:
  turbovec-data:

For Azure Container Apps, Kubernetes or another platform, the same idea applies. Deploy the Rust service as a private internal service. Do not expose it publicly. The .NET API should be the public boundary.

HTTP first, gRPC later

It is tempting to jump straight to gRPC because vector payloads can be large and binary protocols are efficient. That may be the right final answer, especially if you're sending big batches of vectors or running high query volume. Id still start with HTTP unless you already know you need gRPC. HTTP gives you simpler debugging, easier curl tests, easier local development and fewer moving parts. The payload for one 1536-dimensional embedding is not tiny, but its usually acceptable for a first version. Once the shape is proven, you can move the contract to gRPC and use protobuf repeated floats for the vector payload. The architecture does not change. Only the transport changes. Thats another reason the .NET side should depend on IVectorSearchClient. The application should not care whether the adapter uses HTTP, gRPC or something else.

Where this fits in a clean architecture solution

In a clean architecture or ports and adapters style .NET solution, TurboVec belongs outside the application core. The application layer defines the port.

public interface IVectorSearchClient
{
    Task<IReadOnlyList<VectorSearchResult>> SearchAsync(
        VectorSearchRequest request,
        CancellationToken stopToken);
}

The infrastructure layer implements the adapter.

public sealed class TurbovecVectorSearchClient : IVectorSearchClient
{
}

The Rust service sits outside the .NET solution boundary as a separate deployable component. Your domain model should know nothing about TurboVec. Your use case or application service can ask for semantic matches through an interface. Your infrastructure project can decide how that happens. That keeps the design flexible. If TurboVec works well, keep it. If your retrieval needs move towards hybrid ranking, distributed indexing, managed search or advanced metadata queries, swap the adapter.

Handling deletes and rebuilds

Deletes need more care than adds. TurboVec provides stable external IDs through IdMapIndex, which is the type you should use if documents can be removed. The Rust service can expose a delete endpoint later.

#[derive(Deserialize)]
struct DeleteVectorRequest {
    id: u64,
}

The implementation is straightforward, but the lifecycle needs a proper decision. Do you remove vectors immediately when a document is deleted? Do you soft-delete in SQL first and rebuild the index later? Do you maintain separate indexes per tenant? Do you need an audit trail of what was searchable at a point in time?

For most business systems, Id make SQL the authority. If SQL says a document is deleted, the .NET API should not pass that ID in the allow list anyway. The index can then be updated asynchronously. That gives you safety even if the vector index briefly lags behind. You should also have a rebuild path from source data. Any search index can become corrupt, stale or out of sync. Keep enough information in durable storage to rebuild the TurboVec index from scratch. The compressed index file is an optimisation. It should not be the only copy of your retrieval data.

What to measure before trusting it

TurboVec makes strong claims around compression, online ingest and search speed. Those claims are interesting, but you should test your own workload before committing to it. Measure memory usage with your embedding model and your chunk count. Measure recall against a baseline you trust. Measure p50 and p95 latency under concurrent search. Measure ingest speed while search traffic is running. Measure how long it takes to load or rebuild the index. Lastly, measure what happens when the allow list is small, large or empty.

The key comparison is not just TurboVec versus another vector index in isolation. The real comparison is the whole retrieval path.

Use case for this approach

I would use the Rust service approach for a private RAG system where memory pressure, latency or data control are becoming a real concern. Internal document search is a good fit. A claims system could be a good fit. A support knowledge base could be good. Any system where documents remain inside your own environment and retrieval needs to be fast enough to sit in the user path is worth testing.

I would be more cautious if the team needs a full vector database today. TurboVec gives you a local compressed index. It does not give you a managed search platform with clustering, dashboards, backups, and operational support. You can build around it, but you need to be honest about what you are choosing to own.

The real value for .NET developers

The useful way to think about TurboVec from .NET is simple. Do not try to make it feel like a C# library. Treat it as a specialised retrieval engine. Let .NET handle the application. Let Rust handle the compressed vector index. Let your database remain the source of truth. Keep the boundary small.

That gives you a practical way to make use of TurboVec without turning your .NET system into a mixed-language mess. The service can start small, run locally, sit behind an interface, and prove whether the memory and speed claims help your actual workload. If it performs well, you have a serious retrieval component. If it doesnt, your application architecture survives the experiment. Thats the right kind of integration. You get the upside of a new vector index without betting the whole system on it.

TurboVec GitHub repository

TurboVec Rust crate documentation

TurboVec API reference

TurboQuant papernt paper

Google Research TurboQuant overview

Microsoft IHttpClientFactory documentation

Microsoft gRPC with .NET documentationth .NET documentation

Microsoft improving C# memory safetymemory safety