Image Input (Vision)
Table of contents
- Sending Images with an Agent
- Direct Vision Call via Backend
- Supported Image Formats
- Requirements
- Example: Invoice OCR
- Example: Multi-page Document Analysis
Agentic supports vision-capable models by letting you include images in any agent turn. Images can be provided as a URL, a local file path, or a base64 data URL — the library handles the rest.
Sending Images with an Agent
Pass images as part of a chat turn using the images: parameter:
// From a URL
await agent.ChatStreamAsync(
"What is in this image?",
images: ["https://example.com/photo.jpg"]);
// From a local file
await agent.ChatStreamAsync(
"Describe this diagram.",
images: ["/home/user/documents/diagram.png"]);
// As a base64 data URL
await agent.ChatStreamAsync(
"What text appears in this screenshot?",
images: [$"data:image/png;base64,{Convert.ToBase64String(imageBytes)}"]);
// Multiple images
await agent.ChatStreamAsync(
"Compare these two charts.",
images: ["https://example.com/chart1.png", "https://example.com/chart2.png"]);
Direct Vision Call via Backend
You can also call the backend directly with image inputs:
var result = await lm.RespondAsync(
[ResponseInput.User(
"What brand is shown in this logo?",
["https://example.com/logo.png"])]);
Console.WriteLine(result.OutputText);
Supported Image Formats
| Format | Example |
|---|---|
| HTTPS URL | https://example.com/photo.jpg |
| HTTP URL | http://internal-server/image.png |
| Local file path | /home/user/image.png or C:\images\photo.jpg |
| Base64 data URL | data:image/jpeg;base64,/9j/4AAQ... |
The library detects the format automatically and converts local files to base64 data URLs before sending them to the model.
Requirements
- Your model must support vision (e.g. GPT-4o, Claude 3, LLaVA, Qwen-VL, etc.)
- The
EmbeddingModelfield is not required for vision — onlyModelNameis needed - Large local images are loaded entirely into memory before being sent
Example: Invoice OCR
var lm = new OpenAIBackend(new LMConfig
{
Endpoint = "http://localhost:1234",
ModelName = "llava-v1.6-34b",
});
var agent = new Agent(lm, new AgentOptions
{
SystemPrompt =
"You are a document extraction assistant. " +
"Extract structured data from images accurately.",
OnEvent = e =>
{
if (e.Kind == AgentEventKind.TextDelta)
Console.Write(e.Text);
},
});
await agent.ChatStreamAsync(
"Extract the invoice number, date, vendor, line items, and total from this invoice.",
images: ["/tmp/invoice.pdf.png"]);
Example: Multi-page Document Analysis
var pageImages = Directory
.GetFiles("/tmp/pages", "*.png")
.OrderBy(f => f)
.ToArray();
await agent.ChatStreamAsync(
$"I'm sending you {pageImages.Length} pages of a report. " +
"Please summarise the key findings.",
images: pageImages);