Context Compaction

Table of contents

  1. Setup
  2. ConversationCompactionOptions Reference
  3. Strategies
  4. Levels
  5. Custom Compactor
  6. Example: Long Research Session

Long conversations eventually exceed the model’s context window. Context compaction automatically trims or summarises older turns so the session can continue seamlessly without hitting the token limit.

Context compaction is configured via LmSessionOptions.Compaction and applies to sessions created through NativeBackend. See NativeBackend for setup details.

Setup

Pass a ConversationCompactionOptions record when constructing LmSessionOptions:

using Agentic.Runtime.Mantle;

var sessionOptions = new LmSessionOptions
{
    ModelPath    = @"/path/to/model.gguf",
    ToolRegistry = new ToolRegistry(),
    Compaction   = new ConversationCompactionOptions(
        MaxInputTokens:        4096,
        ReservedForGeneration: 256),
};

ConversationCompactionOptions Reference

ParameterDefaultDescription
MaxInputTokens(required)Total token budget for the prompt window
ReservedForGeneration512Tokens reserved for the model’s output; deducted from MaxInputTokens to get the effective prompt budget
StrategyPinnedSystemFifoHow older messages are discarded (see Strategies)
LevelBalancedAggressiveness of compaction (see Levels)
AlwaysKeepSystemtruePrevent the system prompt from being evicted
HotTrailMessages4Minimum number of recent messages always kept verbatim

The effective prompt budget is MaxInputTokens - ReservedForGeneration.

Strategies

ContextCompactionStrategyDescription
FifoSlidingWindowDrop the oldest messages first (pure FIFO)
PinnedSystemFifoKeep the system prompt pinned; drop oldest non-system messages
MiddleOutElisionKeep the beginning and end of the conversation; elide the middle
HeuristicPruningDrop low-signal messages first (tool call results, short assistant turns)
RollingSummarizationRequires a custom IConversationCompactor implementation
VectorAugmentedRecallRequires a custom IConversationCompactor implementation

Levels

ConversationCompactionLevelWhat is preserved
LightMinimal retention — fast, lowest token overhead
BalancedBalanced retention of context and important turns
AggressiveMaximum compression — retains only the most essential turns

Custom Compactor

For RollingSummarization or VectorAugmentedRecall, supply your own IConversationCompactor:

public class MySummarizingCompactor : IConversationCompactor
{
    public IReadOnlyList<ChatMessage> Compact(
        IReadOnlyList<ChatMessage> messages,
        ConversationCompactionContext context)
    {
        // your summarization logic here
        return messages;
    }
}

var sessionOptions = new LmSessionOptions
{
    ModelPath            = @"/path/to/model.gguf",
    ToolRegistry         = new ToolRegistry(),
    Compaction           = new ConversationCompactionOptions(MaxInputTokens: 8192),
    ConversationCompactor = new MySummarizingCompactor(),
};

Example: Long Research Session

using Agentic;
using Agentic.Runtime.Core;
using Agentic.Runtime.Mantle;

var sessionOptions = new LmSessionOptions
{
    ModelPath    = @"/path/to/model.gguf",
    ToolRegistry = new ToolRegistry(),
    Compaction   = new ConversationCompactionOptions(
        MaxInputTokens:        8192,
        ReservedForGeneration: 512,
        Strategy:              ContextCompactionStrategy.PinnedSystemFifo,
        Level:                 ConversationCompactionLevel.Balanced,
        AlwaysKeepSystem:      true,
        HotTrailMessages:      6),
};

await using var lm = new NativeBackend(
    sessionOptions,
    backend:         LlamaBackend.Cuda,
    cudaVersion:     "12.4",
    installProgress: new Progress<(string msg, double pct)>(p => Console.Write($"\r[{p.pct:F0}%] {p.msg}")));

var agent = new Agent(lm, new AgentOptions
{
    SystemPrompt = "You are a research assistant.",
    OnEvent = e =>
    {
        if (e.Kind == AgentEventKind.TextDelta)
            Console.Write(e.Text);
    },
});

// This can run for many turns without hitting the context limit
while (true)
{
    Console.Write("\nYou: ");
    var input = Console.ReadLine();
    if (string.IsNullOrEmpty(input)) break;
    await agent.ChatStreamAsync(input);
    Console.WriteLine();
}