The Chat API provides a text-based interface to the same agent reasoning loop that powers the voice pipeline. Send a message, get a response — with optional session continuity, tool execution, and intelligent model routing.Documentation Index
Fetch the complete documentation index at: https://docs.agtos.ai/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
http://<host>:4102/api
The chat endpoint is served on the health/API server (port 4102), not the voice server (port 3000).
Request
| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | The user’s message. Must be non-empty, max 10,000 characters. |
sessionId | string | No | Session ID for conversation continuity. Omit for stateless. |
platform | string | No | Client platform identifier (defaults to web). |
Response
metadata object provides transparency into what happened during processing:
- stepCount: How many reasoning steps the agent took (including tool calls)
- toolCallCount: How many tools were invoked
- durationMs: Total processing time in milliseconds
Streaming Endpoint
For real-time streaming responses with thinking/reasoning blocks and tool call visibility, use the SSE streaming endpoint:Request
| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | The user’s message. Must be non-empty, max 10,000 characters. |
sessionId | string | No | Session ID for conversation continuity. |
platform | string | No | Client platform identifier (defaults to web). |
files | array | No | Images to include (base64 content, mimeType, encoding). |
SSE Events
The endpoint returns atext/event-stream response with the following event types:
| Event Type | Fields | Description |
|---|---|---|
content | content | Streamed text token(s) |
thinking | content | Thinking/reasoning block from the model |
tool_start | name, id, input | Tool call initiated |
tool_result | id, result, error, durationMs | Tool call completed |
step | stepNumber | Agent reasoning step boundary |
error | code, message | Error during processing |
done | metadata | Stream complete with final metadata |
Done event metadata
Image & Vision Support
The chat endpoints accept images via thefiles array. Images are formatted per-provider automatically:
- Claude:
imagecontent blocks with base64source - OpenAI:
image_urlcontent parts with data URIs - Ollama:
imagesarray with raw base64 - OpenRouter: Follows the upstream provider format
Thinking & Reasoning
When using models with thinking/reasoning capabilities, the streaming endpoint surfaces internal reasoning:- Claude: Extended thinking blocks (adaptive mode)
- OpenAI o-series: Reasoning summary text via Responses API
- Ollama:
thinkfield (Qwen3, DeepSeek-R1, Gemma 4) - OpenRouter:
message.reasoningcontent
thinking events. Multi-turn reasoning continuity is preserved — the agent stores thinking tokens and replays them on subsequent turns so the model can build on its prior reasoning.
Model Router
The chat endpoint routes requests through a slot-based model routing system. Each named slot maps to a configured provider and model.Intent Classification
A lightweight local model (Ollama) classifies the user’s intent into categories like
general_knowledge, smart_home, scheduling, code, etc. This classification takes 10-50ms.Slot Resolution
The intent maps to a capability slot (
chat, reasoning, coding, tool_calling, creative). Each slot is configured with a provider+model pair.If Ollama is unavailable for intent classification, all requests use the
chat slot directly. The routing system is designed for graceful degradation, not hard dependencies. See Model Router for configuration.Agent Reasoning Loop
When a request requires tool use, the chat endpoint runs the full agent reasoning loop:- The LLM receives the user’s message along with available tools
- If the LLM decides to call a tool, the tool is executed and the result is fed back
- This loop continues until the LLM produces a final text response
- The complete response (including tool results) is returned to the caller
CommandProtocol interface.
Example: Multi-step tool execution
Example: Multi-step tool execution
A request like “Schedule a morning briefing for 7 AM every day” might involve:
- Step 1: LLM recognizes the scheduling intent and calls
schedule.create - Step 2: The scheduler tool creates a cron task and returns the task ID
- Step 3: LLM formats the confirmation response with the task details
Session Management
Passing asessionId enables conversation continuity. The session system provides:
- Working memory: Recent conversation turns are maintained in the LLM’s context
- Episodic memory: Past conversations are summarized and available for recall
- Semantic memory: Long-term facts and preferences are retrieved via vector search
sessionId is provided, the request is stateless — there is no conversation history.
Input Validation
All requests are validated using Zod schemas before processing:textmust be a non-empty string (max 10,000 characters)- Request bodies are capped at 1 MB
- Invalid JSON returns
400 - Missing or empty
textfield returns400with validation details
400 Bad Request
Rate Limiting
The chat endpoint has a stricter rate limit than general API endpoints:| Scope | Default Limit | Environment Variable |
|---|---|---|
| General API | 100 req/min | API_RATE_LIMIT |
| Chat + Tasks | 20 req/min | CHAT_RATE_LIMIT |
429 Too Many Requests
Authentication
WhenAGTOS_API_KEY is set, the chat endpoint requires a Bearer token:
Authentication is opt-in. If
AGTOS_API_KEY is not set, the endpoint is open. See Authentication for details on configuring API access.Error Responses
| Status | Meaning | Common Cause |
|---|---|---|
400 | Bad Request | Missing/empty text, invalid JSON, validation failure |
401 | Unauthorized | Missing or invalid API key (when auth is enabled) |
429 | Too Many Requests | Rate limit exceeded |
500 | Internal Server Error | Model provider timeout, agent loop failure |
503 | Service Unavailable | Voice pipeline not initialized |
CLI Integration
Thenpx agtos chat command uses this same endpoint under the hood. It opens an interactive prompt with session persistence:
/quit, /exit, /help, /session (show session ID), /new (start new session). See CLI Reference for details.
Conversation History
Retrieve past conversation messages for a session:sessionExpired is true and messages is empty. The dashboard Conversations page uses this endpoint to let users browse and resume past sessions.
Background Tasks
For longer-running agent tasks, use the tasks endpoint instead:taskId for tracking.
What’s next
Memory System
How sessions build context with three-tier memory.
HTTP Endpoints
Complete REST API reference with all request/response schemas.