Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agtos.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Chat API provides a text-based interface to the same agent reasoning loop that powers the voice pipeline. Send a message, get a response — with optional session continuity, tool execution, and intelligent model routing.

Endpoint

POST /api/chat
Base URL: http://<host>:4102/api The chat endpoint is served on the health/API server (port 4102), not the voice server (port 3000).

Request

{
  "text": "What's the weather like today?",
  "sessionId": "session-abc123",
  "platform": "web"
}
FieldTypeRequiredDescription
textstringYesThe user’s message. Must be non-empty, max 10,000 characters.
sessionIdstringNoSession ID for conversation continuity. Omit for stateless.
platformstringNoClient platform identifier (defaults to web).

Response

{
  "text": "Based on current conditions, it's sunny and 72 degrees in your area.",
  "sessionId": "session-abc123",
  "platform": "web",
  "metadata": {
    "stepCount": 2,
    "toolCallCount": 1,
    "durationMs": 1450
  }
}
The metadata object provides transparency into what happened during processing:
  • stepCount: How many reasoning steps the agent took (including tool calls)
  • toolCallCount: How many tools were invoked
  • durationMs: Total processing time in milliseconds

Streaming Endpoint

For real-time streaming responses with thinking/reasoning blocks and tool call visibility, use the SSE streaming endpoint:
POST /api/chat/stream

Request

{
  "text": "Explain how neural networks learn",
  "sessionId": "session-abc123",
  "files": [
    {
      "content": "base64-encoded-image-data",
      "mimeType": "image/png",
      "encoding": "base64"
    }
  ]
}
FieldTypeRequiredDescription
textstringYesThe user’s message. Must be non-empty, max 10,000 characters.
sessionIdstringNoSession ID for conversation continuity.
platformstringNoClient platform identifier (defaults to web).
filesarrayNoImages to include (base64 content, mimeType, encoding).

SSE Events

The endpoint returns a text/event-stream response with the following event types:
Event TypeFieldsDescription
contentcontentStreamed text token(s)
thinkingcontentThinking/reasoning block from the model
tool_startname, id, inputTool call initiated
tool_resultid, result, error, durationMsTool call completed
stepstepNumberAgent reasoning step boundary
errorcode, messageError during processing
donemetadataStream complete with final metadata
Done event metadata
{
  "stepCount": 3,
  "toolCallCount": 1,
  "durationMs": 2450,
  "sessionId": "session-abc123",
  "usage": { "inputTokens": 1200, "outputTokens": 450 }
}
The dashboard’s Chat page uses this endpoint. Connect with fetch() + ReadableStream — native EventSource does not support POST requests.

Image & Vision Support

The chat endpoints accept images via the files array. Images are formatted per-provider automatically:
  • Claude: image content blocks with base64 source
  • OpenAI: image_url content parts with data URIs
  • Ollama: images array with raw base64
  • OpenRouter: Follows the upstream provider format
Supported formats: PNG, JPEG, GIF, WebP. The dashboard Chat page supports paste and drag-and-drop image upload.

Thinking & Reasoning

When using models with thinking/reasoning capabilities, the streaming endpoint surfaces internal reasoning:
  • Claude: Extended thinking blocks (adaptive mode)
  • OpenAI o-series: Reasoning summary text via Responses API
  • Ollama: think field (Qwen3, DeepSeek-R1, Gemma 4)
  • OpenRouter: message.reasoning content
Thinking blocks are streamed as thinking events. Multi-turn reasoning continuity is preserved — the agent stores thinking tokens and replays them on subsequent turns so the model can build on its prior reasoning.

Model Router

The chat endpoint routes requests through a slot-based model routing system. Each named slot maps to a configured provider and model.
1

Intent Classification

A lightweight local model (Ollama) classifies the user’s intent into categories like general_knowledge, smart_home, scheduling, code, etc. This classification takes 10-50ms.
2

Slot Resolution

The intent maps to a capability slot (chat, reasoning, coding, tool_calling, creative). Each slot is configured with a provider+model pair.
3

Fallback Chain

If the primary slot provider is unavailable (down, billing exhausted), the system follows the slot’s fallback chain to an alternate provider.
User Input → Intent Classifier (Ollama)

                 ├─ general → chat slot → Claude / OpenAI / Ollama
                 ├─ complex → reasoning slot → Claude Opus
                 ├─ code    → coding slot → Ollama (Qwen2.5-Coder)
                 └─ tools   → tool_calling slot → Claude Sonnet

                                    └─ Fallback chain on failure
If Ollama is unavailable for intent classification, all requests use the chat slot directly. The routing system is designed for graceful degradation, not hard dependencies. See Model Router for configuration.

Agent Reasoning Loop

When a request requires tool use, the chat endpoint runs the full agent reasoning loop:
  1. The LLM receives the user’s message along with available tools
  2. If the LLM decides to call a tool, the tool is executed and the result is fed back
  3. This loop continues until the LLM produces a final text response
  4. The complete response (including tool results) is returned to the caller
The agent loop is provider-agnostic — it works with both Claude and Ollama through the CommandProtocol interface.
A request like “Schedule a morning briefing for 7 AM every day” might involve:
  1. Step 1: LLM recognizes the scheduling intent and calls schedule.create
  2. Step 2: The scheduler tool creates a cron task and returns the task ID
  3. Step 3: LLM formats the confirmation response with the task details
{
  "text": "I've created a recurring morning briefing scheduled for 7:00 AM daily. Task ID: task-abc123.",
  "metadata": {
    "stepCount": 3,
    "toolCallCount": 1,
    "durationMs": 2100
  }
}

Session Management

Passing a sessionId enables conversation continuity. The session system provides:
  • Working memory: Recent conversation turns are maintained in the LLM’s context
  • Episodic memory: Past conversations are summarized and available for recall
  • Semantic memory: Long-term facts and preferences are retrieved via vector search
When no sessionId is provided, the request is stateless — there is no conversation history.
# First message — start a new session
curl -X POST http://localhost:4102/api/chat \
  -H "Content-Type: application/json" \
  -d '{"text": "My name is Alex", "sessionId": "session-001"}'

# Follow-up — same session, agent remembers context
curl -X POST http://localhost:4102/api/chat \
  -H "Content-Type: application/json" \
  -d '{"text": "What is my name?", "sessionId": "session-001"}'

Input Validation

All requests are validated using Zod schemas before processing:
  • text must be a non-empty string (max 10,000 characters)
  • Request bodies are capped at 1 MB
  • Invalid JSON returns 400
  • Missing or empty text field returns 400 with validation details
400 Bad Request
{
  "error": "Validation failed",
  "issues": ["Text is required"]
}

Rate Limiting

The chat endpoint has a stricter rate limit than general API endpoints:
ScopeDefault LimitEnvironment Variable
General API100 req/minAPI_RATE_LIMIT
Chat + Tasks20 req/minCHAT_RATE_LIMIT
Rate limit information is included in response headers:
X-RateLimit-Remaining: 18
When the limit is exceeded:
429 Too Many Requests
{
  "error": "Too many requests",
  "retryAfter": 60
}

Authentication

When AGTOS_API_KEY is set, the chat endpoint requires a Bearer token:
curl -X POST http://localhost:4102/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{"text": "Hello"}'
Authentication is opt-in. If AGTOS_API_KEY is not set, the endpoint is open. See Authentication for details on configuring API access.

Error Responses

StatusMeaningCommon Cause
400Bad RequestMissing/empty text, invalid JSON, validation failure
401UnauthorizedMissing or invalid API key (when auth is enabled)
429Too Many RequestsRate limit exceeded
500Internal Server ErrorModel provider timeout, agent loop failure
503Service UnavailableVoice pipeline not initialized

CLI Integration

The npx agtos chat command uses this same endpoint under the hood. It opens an interactive prompt with session persistence:
npx agtos chat
In-chat commands: /quit, /exit, /help, /session (show session ID), /new (start new session). See CLI Reference for details.

Conversation History

Retrieve past conversation messages for a session:
GET /api/chat/history/:sessionId
{
  "sessionId": "session-abc123",
  "sessionExpired": false,
  "messages": [
    {
      "role": "user",
      "content": "What is my name?",
      "timestamp": 1711612800000
    },
    {
      "role": "assistant",
      "content": "Your name is Alex.",
      "timestamp": 1711612802000
    }
  ],
  "count": 2
}
When the session has expired or doesn’t exist, sessionExpired is true and messages is empty. The dashboard Conversations page uses this endpoint to let users browse and resume past sessions.

Background Tasks

For longer-running agent tasks, use the tasks endpoint instead:
POST /api/tasks
{
  "topic": "Summarize the latest news about AI safety"
}
Tasks run through the same agent reasoning loop but are designed for workloads that may take several seconds. The response includes a generated taskId for tracking.

What’s next

Memory System

How sessions build context with three-tier memory.

HTTP Endpoints

Complete REST API reference with all request/response schemas.