Chat API

The Chat API provides a text-based interface to the same agent reasoning loop that powers the voice pipeline. Send a message, get a response — with optional session continuity, tool execution, and intelligent model routing.

Endpoint

POST /api/chat

Base URL: http://<host>:4102/api The chat endpoint is served on the health/API server (port 4102), not the voice server (port 3000).

Request

{
  "text": "What's the weather like today?",
  "sessionId": "session-abc123",
  "platform": "web"
}

Field	Type	Required	Description
`text`	string	Yes	The user’s message. Must be non-empty, max 10,000 characters.
`sessionId`	string	No	Session ID for conversation continuity. Omit for stateless.
`platform`	string	No	Client platform identifier (defaults to `web`).

Response

{
  "text": "Based on current conditions, it's sunny and 72 degrees in your area.",
  "sessionId": "session-abc123",
  "platform": "web",
  "metadata": {
    "stepCount": 2,
    "toolCallCount": 1,
    "durationMs": 1450
  }
}

The metadata object provides transparency into what happened during processing:

stepCount: How many reasoning steps the agent took (including tool calls)
toolCallCount: How many tools were invoked
durationMs: Total processing time in milliseconds

Streaming Endpoint

For real-time streaming responses with thinking/reasoning blocks and tool call visibility, use the SSE streaming endpoint:

POST /api/chat/stream

Request

{
  "text": "Explain how neural networks learn",
  "sessionId": "session-abc123",
  "files": [
    {
      "content": "base64-encoded-image-data",
      "mimeType": "image/png",
      "encoding": "base64"
    }
  ]
}

Field	Type	Required	Description
`text`	string	Yes	The user’s message. Must be non-empty, max 10,000 characters.
`sessionId`	string	No	Session ID for conversation continuity.
`platform`	string	No	Client platform identifier (defaults to `web`).
`files`	array	No	Images to include (base64 content, mimeType, encoding).

SSE Events

The endpoint returns a text/event-stream response with the following event types:

Event Type	Fields	Description
`content`	`content`	Streamed text token(s)
`thinking`	`content`	Thinking/reasoning block from the model
`tool_start`	`name`, `id`, `input`	Tool call initiated
`tool_result`	`id`, `result`, `error`, `durationMs`	Tool call completed
`step`	`stepNumber`	Agent reasoning step boundary
`error`	`code`, `message`	Error during processing
`done`	`metadata`	Stream complete with final metadata

Done event metadata

{
  "stepCount": 3,
  "toolCallCount": 1,
  "durationMs": 2450,
  "sessionId": "session-abc123",
  "usage": { "inputTokens": 1200, "outputTokens": 450 }
}

The dashboard’s Chat page uses this endpoint. Connect with fetch() + ReadableStream — native EventSource does not support POST requests.

Image & Vision Support

The chat endpoints accept images via the files array. Images are formatted per-provider automatically:

Claude: image content blocks with base64 source
OpenAI: image_url content parts with data URIs
Ollama: images array with raw base64
OpenRouter: Follows the upstream provider format

Supported formats: PNG, JPEG, GIF, WebP. The dashboard Chat page supports paste and drag-and-drop image upload.

Thinking & Reasoning

When using models with thinking/reasoning capabilities, the streaming endpoint surfaces internal reasoning:

Claude: Extended thinking blocks (adaptive mode)
OpenAI o-series: Reasoning summary text via Responses API
Ollama: think field (Qwen3, DeepSeek-R1, Gemma 4)
OpenRouter: message.reasoning content

Thinking blocks are streamed as thinking events. Multi-turn reasoning continuity is preserved — the agent stores thinking tokens and replays them on subsequent turns so the model can build on its prior reasoning.

Model Router

The chat endpoint routes requests through a slot-based model routing system. Each named slot maps to a configured provider and model.

Intent Classification

A lightweight local model (Ollama) classifies the user’s intent into categories like general_knowledge, smart_home, scheduling, code, etc. This classification takes 10-50ms.

Slot Resolution

The intent maps to a capability slot (chat, reasoning, coding, tool_calling, creative). Each slot is configured with a provider+model pair.

Fallback Chain

If the primary slot provider is unavailable (down, billing exhausted), the system follows the slot’s fallback chain to an alternate provider.

User Input → Intent Classifier (Ollama)
                 │
                 ├─ general → chat slot → Claude / OpenAI / Ollama
                 ├─ complex → reasoning slot → Claude Opus
                 ├─ code    → coding slot → Ollama (Qwen2.5-Coder)
                 └─ tools   → tool_calling slot → Claude Sonnet
                                    │
                                    └─ Fallback chain on failure

If Ollama is unavailable for intent classification, all requests use the chat slot directly. The routing system is designed for graceful degradation, not hard dependencies. See Model Router for configuration.

Agent Reasoning Loop

When a request requires tool use, the chat endpoint runs the full agent reasoning loop:

The LLM receives the user’s message along with available tools
If the LLM decides to call a tool, the tool is executed and the result is fed back
This loop continues until the LLM produces a final text response
The complete response (including tool results) is returned to the caller

The agent loop is provider-agnostic — it works with both Claude and Ollama through the CommandProtocol interface.

Example: Multi-step tool execution

A request like “Schedule a morning briefing for 7 AM every day” might involve:

Step 1: LLM recognizes the scheduling intent and calls schedule.create
Step 2: The scheduler tool creates a cron task and returns the task ID
Step 3: LLM formats the confirmation response with the task details

{
  "text": "I've created a recurring morning briefing scheduled for 7:00 AM daily. Task ID: task-abc123.",
  "metadata": {
    "stepCount": 3,
    "toolCallCount": 1,
    "durationMs": 2100
  }
}

Session Management

Passing a sessionId enables conversation continuity. The session system provides:

Working memory: Recent conversation turns are maintained in the LLM’s context
Episodic memory: Past conversations are summarized and available for recall
Semantic memory: Long-term facts and preferences are retrieved via vector search

When no sessionId is provided, the request is stateless — there is no conversation history.

# First message — start a new session
curl -X POST http://localhost:4102/api/chat \
  -H "Content-Type: application/json" \
  -d '{"text": "My name is Alex", "sessionId": "session-001"}'

# Follow-up — same session, agent remembers context
curl -X POST http://localhost:4102/api/chat \
  -H "Content-Type: application/json" \
  -d '{"text": "What is my name?", "sessionId": "session-001"}'

Input Validation

All requests are validated using Zod schemas before processing:

text must be a non-empty string (max 10,000 characters)
Request bodies are capped at 1 MB
Invalid JSON returns 400
Missing or empty text field returns 400 with validation details

400 Bad Request

{
  "error": "Validation failed",
  "issues": ["Text is required"]
}

Rate Limiting

The chat endpoint has a stricter rate limit than general API endpoints:

Scope	Default Limit	Environment Variable
General API	100 req/min	`API_RATE_LIMIT`
Chat + Tasks	20 req/min	`CHAT_RATE_LIMIT`

Rate limit information is included in response headers:

X-RateLimit-Remaining: 18

When the limit is exceeded:

429 Too Many Requests

{
  "error": "Too many requests",
  "retryAfter": 60
}

Authentication

When AGTOS_API_KEY is set, the chat endpoint requires a Bearer token:

curl -X POST http://localhost:4102/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{"text": "Hello"}'

Authentication is opt-in. If AGTOS_API_KEY is not set, the endpoint is open. See Authentication for details on configuring API access.

Error Responses

Status	Meaning	Common Cause
`400`	Bad Request	Missing/empty `text`, invalid JSON, validation failure
`401`	Unauthorized	Missing or invalid API key (when auth is enabled)
`429`	Too Many Requests	Rate limit exceeded
`500`	Internal Server Error	Model provider timeout, agent loop failure
`503`	Service Unavailable	Voice pipeline not initialized

CLI Integration

The npx agtos chat command uses this same endpoint under the hood. It opens an interactive prompt with session persistence:

npx agtos chat

In-chat commands: /quit, /exit, /help, /session (show session ID), /new (start new session). See CLI Reference for details.

Conversation History

Retrieve past conversation messages for a session:

GET /api/chat/history/:sessionId

{
  "sessionId": "session-abc123",
  "sessionExpired": false,
  "messages": [
    {
      "role": "user",
      "content": "What is my name?",
      "timestamp": 1711612800000
    },
    {
      "role": "assistant",
      "content": "Your name is Alex.",
      "timestamp": 1711612802000
    }
  ],
  "count": 2
}

When the session has expired or doesn’t exist, sessionExpired is true and messages is empty. The dashboard Conversations page uses this endpoint to let users browse and resume past sessions.

Background Tasks

For longer-running agent tasks, use the tasks endpoint instead:

POST /api/tasks

{
  "topic": "Summarize the latest news about AI safety"
}

Tasks run through the same agent reasoning loop but are designed for workloads that may take several seconds. The response includes a generated taskId for tracking.

Endpoint

Request

Response

Streaming Endpoint

Request

SSE Events

Image & Vision Support

Thinking & Reasoning

Model Router

Agent Reasoning Loop

Session Management

Input Validation

Rate Limiting

Authentication

Error Responses

CLI Integration

Conversation History

Background Tasks

What’s next

Memory System

HTTP Endpoints

​Endpoint

​Request

​Response

​Streaming Endpoint

​Request

​SSE Events

​Image & Vision Support

​Thinking & Reasoning

​Model Router

​Agent Reasoning Loop

​Session Management

​Input Validation

​Rate Limiting

​Authentication

​Error Responses

​CLI Integration

​Conversation History

​Background Tasks

​What’s next

Memory System

HTTP Endpoints

Endpoint

Request

Response

Streaming Endpoint

Request

SSE Events

Image & Vision Support

Thinking & Reasoning

Model Router

Agent Reasoning Loop

Session Management

Input Validation

Rate Limiting

Authentication

Error Responses

CLI Integration

Conversation History

Background Tasks

What’s next