Endpoint
http://<host>:4102/api
The chat endpoint is served on the health/API server (port 4102), not the voice server (port 3000).
Request
| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | The user’s message. Must be non-empty, max 10,000 characters. |
sessionId | string | No | Session ID for conversation continuity. Omit for stateless. |
platform | string | No | Client platform identifier (defaults to web). |
Response
metadata object provides transparency into what happened during processing:
- stepCount: How many reasoning steps the agent took (including tool calls)
- toolCallCount: How many tools were invoked
- durationMs: Total processing time in milliseconds
Model Router
The chat endpoint routes requests through a three-tier model routing system. This system balances speed, cost, and capability by using the right model for each request.Intent Classification
A lightweight local model (Ollama) classifies the user’s intent into categories like
general_knowledge, smart_home, scheduling, code, etc. This classification takes 10-50ms.Local Model Attempt
For simple queries (greetings, basic facts, quick lookups), the local Ollama model handles the request directly. No cloud API call needed.
If Ollama is unavailable, all requests fall through directly to Claude. The routing system is designed for graceful degradation, not hard dependencies.
Agent Reasoning Loop
When a request requires tool use, the chat endpoint runs the full agent reasoning loop:- The LLM receives the user’s message along with available tools
- If the LLM decides to call a tool, the tool is executed and the result is fed back
- This loop continues until the LLM produces a final text response
- The complete response (including tool results) is returned to the caller
CommandProtocol interface.
Example: Multi-step tool execution
Example: Multi-step tool execution
A request like “Schedule a morning briefing for 7 AM every day” might involve:
- Step 1: LLM recognizes the scheduling intent and calls
schedule.create - Step 2: The scheduler tool creates a cron task and returns the task ID
- Step 3: LLM formats the confirmation response with the task details
Session Management
Passing asessionId enables conversation continuity. The session system provides:
- Working memory: Recent conversation turns are maintained in the LLM’s context
- Episodic memory: Past conversations are summarized and available for recall
- Semantic memory: Long-term facts and preferences are retrieved via vector search
sessionId is provided, the request is stateless — there is no conversation history.
Input Validation
All requests are validated using Zod schemas before processing:textmust be a non-empty string (max 10,000 characters)- Request bodies are capped at 1 MB
- Invalid JSON returns
400 - Missing or empty
textfield returns400with validation details
Rate Limiting
The chat endpoint has a stricter rate limit than general API endpoints:| Scope | Default Limit | Environment Variable |
|---|---|---|
| General API | 100 req/min | AGTOS_API_RATE_LIMIT |
| Chat + Tasks | 20 req/min | AGTOS_CHAT_RATE_LIMIT |
Authentication
WhenAGTOS_API_KEY is set, the chat endpoint requires a Bearer token:
Authentication is opt-in. If
AGTOS_API_KEY is not set, the endpoint is open. See Authentication for details on configuring API access.Error Responses
| Status | Meaning | Common Cause |
|---|---|---|
400 | Bad Request | Missing/empty text, invalid JSON, validation failure |
401 | Unauthorized | Missing or invalid API key (when auth is enabled) |
429 | Too Many Requests | Rate limit exceeded |
500 | Internal Server Error | Model provider timeout, agent loop failure |
503 | Service Unavailable | Voice pipeline not initialized |
CLI Integration
Thenpx agtos chat command uses this same endpoint under the hood:
Background Tasks
For longer-running agent tasks, use the tasks endpoint instead:taskId for tracking.