BackendRouter

How routing works
Registering backends
1. Add parameters
Basic example - chat + embedding
Multi-model example - several chat backends
Disposal
Using with OpenAIBackend

BackendRouter composes multiple ILLMBackend instances into a single backend and routes each call to the right one. The most common use-case is pairing a large chat model with a small specialised embedding model, but any number of named backends can be registered.

How routing works

Call type	Backend used
`RespondAsync(model: "name")`	The backend registered under `"name"`
`RespondAsync(model: null)`	The default backend
`RespondStreamingAsync(...)`	Same rules as `RespondAsync`
`EmbedAsync(...)`	The embedding backend; falls back to the default if none is designated
`EmbedBatchAsync(...)`	Same as `EmbedAsync`
`PingAsync()`	Pings all registered backends; returns `true` only when every one succeeds

Registering backends

Use the fluent Add method to build the router. Calls can be chained.

var lm = new BackendRouter()
    .Add("qwen-9b",    chatBackend,  isDefault: true)
    .Add("embed-300m", embedBackend, isEmbedding: true);

`Add` parameters

Parameter	Default	Description
`name`	(required)	Routing key. Pass as the `model` argument to `RespondAsync` / `RespondStreamingAsync` to target this backend explicitly
`backend`	(required)	The `ILLMBackend` to register
`isDefault`	`false`	This backend handles requests where `model` is `null`. The first non-embedding backend is auto-designated default; pass `isDefault: true` to override
`isEmbedding`	`false`	This backend handles `EmbedAsync` / `EmbedBatchAsync`

Basic example - chat + embedding

The most common pattern: a large model for reasoning, a small model for vector embeddings.

using Agentic;
using Agentic.Runtime.Core;
using Mantle = Agentic.Runtime.Mantle;

var chatOptions = new Mantle.LmSessionOptions
{
    ModelPath     = @"/models/qwen3.5-9b-q4.gguf",
    ContextTokens = 8192,
    MaxToolRounds = 32,
    DefaultRequest = new Mantle.ResponseRequest
    {
        MaxOutputTokens = 1024,
        EnableThinking  = false,
    },
};

var embedOptions = new Mantle.LmSessionOptions
{
    ModelPath     = @"/models/embeddinggemma-300m-qat-q4.gguf",
    ContextTokens = 2048,
    BatchTokens   = 512,
};

await using var chatBackend  = new NativeBackend(chatOptions,  LlamaBackend.Cuda);
await using var embedBackend = new NativeBackend(embedOptions, LlamaBackend.Cuda);

await using var lm = new BackendRouter()
    .Add("chat",  chatBackend,  isDefault: true)
    .Add("embed", embedBackend, isEmbedding: true);

// Chat calls go to the default chat backend
var agent = new Agent(lm, new AgentOptions { SystemPrompt = "You are a helpful assistant." });
await agent.ChatStreamAsync("Summarise this document.");

// Embedding calls go to the embedding backend
var vector = await lm.EmbedAsync("What is the warranty period?");

// You can also route chat explicitly by name when multiple chat backends are registered
var response = await lm.RespondAsync("Explain this code.", model: "chat");

Multi-model example - several chat backends

await using var lm = new BackendRouter()
    .Add("fast",      fastBackend,      isDefault: true)   // quick answers, small model
    .Add("smart",     smartBackend)                        // complex reasoning
    .Add("ocr",       ocrBackend)                          // vision-heavy tasks
    .Add("embed-300m", embedBackend, isEmbedding: true);

// Default - routes to "fast"
await agent.ChatStreamAsync("What time is it?");

// Explicit routing via the model parameter
await agent.ChatStreamAsync("Refactor this entire module.", model: "smart");
await agent.ChatStreamAsync("Extract text from this receipt.", model: "ocr");

// Embeddings always go to "embed-300m"
var vector = await lm.EmbedAsync("search query");

The model string is matched case-insensitively against registered names. Passing an unrecognised name throws InvalidOperationException with the list of valid names.

Disposal

BackendRouter disposes all registered backends when it is disposed. If backends are also wrapped in await using at the call site, disposal is safe - NativeBackend is idempotent.

// Both patterns are safe
await using var chatBackend  = new NativeBackend(chatOptions,  LlamaBackend.Cuda);
await using var embedBackend = new NativeBackend(embedOptions, LlamaBackend.Cuda);
await using var lm = new BackendRouter()
    .Add("chat",  chatBackend,  isDefault: true)
    .Add("embed", embedBackend, isEmbedding: true);

Using with `OpenAIBackend`

BackendRouter works with any ILLMBackend - mix local and remote backends freely.

var remoteBackend = new OpenAIBackend(new LMConfig
{
    Endpoint  = "https://api.openai.com",
    ModelName = "gpt-4o",
    ApiKey    = "sk-...",
});

await using var localEmbed = new NativeBackend(embedOptions, LlamaBackend.Cpu);

await using var lm = new BackendRouter()
    .Add("gpt-4o", remoteBackend,  isDefault: true)
    .Add("embed",  localEmbed, isEmbedding: true);