Image Input (Vision)

Table of contents

  1. Sending Images with an Agent
  2. Direct Vision Call via Backend
  3. Supported Image Formats
  4. Requirements
  5. Example: Invoice OCR
  6. Example: Multi-page Document Analysis

Agentic supports vision-capable models by letting you include images in any agent turn. Images can be provided as a URL, a local file path, or a base64 data URL — the library handles the rest.

Sending Images with an Agent

Pass images as part of a chat turn using the images: parameter:

// From a URL
await agent.ChatStreamAsync(
    "What is in this image?",
    images: ["https://example.com/photo.jpg"]);

// From a local file
await agent.ChatStreamAsync(
    "Describe this diagram.",
    images: ["/home/user/documents/diagram.png"]);

// As a base64 data URL
await agent.ChatStreamAsync(
    "What text appears in this screenshot?",
    images: [$"data:image/png;base64,{Convert.ToBase64String(imageBytes)}"]);

// Multiple images
await agent.ChatStreamAsync(
    "Compare these two charts.",
    images: ["https://example.com/chart1.png", "https://example.com/chart2.png"]);

Direct Vision Call via Backend

You can also call the backend directly with image inputs:

var result = await lm.RespondAsync(
    [ResponseInput.User(
        "What brand is shown in this logo?",
        ["https://example.com/logo.png"])]);

Console.WriteLine(result.OutputText);

Supported Image Formats

FormatExample
HTTPS URLhttps://example.com/photo.jpg
HTTP URLhttp://internal-server/image.png
Local file path/home/user/image.png or C:\images\photo.jpg
Base64 data URLdata:image/jpeg;base64,/9j/4AAQ...

The library detects the format automatically and converts local files to base64 data URLs before sending them to the model.

Requirements

  • Your model must support vision (e.g. GPT-4o, Claude 3, LLaVA, Qwen-VL, etc.)
  • The EmbeddingModel field is not required for vision — only ModelName is needed
  • Large local images are loaded entirely into memory before being sent

Example: Invoice OCR

var lm = new OpenAIBackend(new LMConfig
{
    Endpoint  = "http://localhost:1234",
    ModelName = "llava-v1.6-34b",
});

var agent = new Agent(lm, new AgentOptions
{
    SystemPrompt =
        "You are a document extraction assistant. " +
        "Extract structured data from images accurately.",
    OnEvent = e =>
    {
        if (e.Kind == AgentEventKind.TextDelta)
            Console.Write(e.Text);
    },
});

await agent.ChatStreamAsync(
    "Extract the invoice number, date, vendor, line items, and total from this invoice.",
    images: ["/tmp/invoice.pdf.png"]);

Example: Multi-page Document Analysis

var pageImages = Directory
    .GetFiles("/tmp/pages", "*.png")
    .OrderBy(f => f)
    .ToArray();

await agent.ChatStreamAsync(
    $"I'm sending you {pageImages.Length} pages of a report. " +
    "Please summarise the key findings.",
    images: pageImages);