Skip to main content
agtOS implements both an MCP server (exposing agtOS capabilities to external AI clients) and an MCP client (connecting to external MCP servers to expand the agent’s tool set). Dynamic tool selection ensures only relevant tools are loaded into the LLM’s context window.

MCP Server

The agtOS MCP server exposes voice pipeline capabilities via the Model Context Protocol. External AI clients (Claude Desktop, Cursor, custom agents) can connect and call agtOS tools. Transport: Streamable HTTP (the current MCP specification standard) Port: 4100 (configurable via MCP_PORT) Endpoint: POST /mcp

Available Tools

The server exposes 9 tools organized into three categories:

Voice Tools

  • voice.speak — Synthesize and play speech
  • voice.listen — Capture and transcribe audio

System Tools

  • system.health — Service health status
  • session.status — Active session info

Orchestration Tools

  • workflow.run — Execute a workflow
  • workflow.list — List workflows
  • schedule.create — Create a scheduled task
  • schedule.list — List scheduled tasks
  • schedule.cancel — Cancel a scheduled task

Connecting from Claude Desktop

Add agtOS to your Claude Desktop MCP configuration:
{
  "mcpServers": {
    "agtos": {
      "url": "http://localhost:4100/mcp"
    }
  }
}
Once connected, Claude Desktop can call agtOS tools directly. For example, asking Claude to “speak the weather forecast” will trigger voice.speak on your agtOS instance.
The MCP server runs in stateless mode — each request creates a fresh transport. No sticky sessions or persistent connections are required, making it compatible with load balancers and proxies.

Architecture

The server uses the MCP SDK’s StreamableHTTPServerTransport:
  • Each incoming POST /mcp creates a fresh transport instance (stateless mode)
  • A shared McpServer instance holds all tool registrations
  • GET /mcp handles SSE streaming for server-initiated notifications
  • DELETE /mcp handles session termination (returns 405 in stateless mode)
agtOS uses MCP SDK v1.27.x with Streamable HTTP transport. The older SSE transport is deprecated and not supported. See ADR-005 for the migration rationale.

MCP Client

The MCP client connects agtOS to external MCP servers, discovering their tools and making them available to the agent reasoning loop. This is how agtOS integrates with smart home servers, knowledge bases, file systems, web search, and other MCP-enabled services.

How It Works

1

Configure Servers

Define external MCP servers in your configuration with their endpoint URLs, optional tool prefix, and reconnection settings.
2

Auto-Discovery

On startup, the client connects to each server and calls tools/list to discover available tools.
3

Tool Registration

Discovered tools are registered in the shared ToolRegistry with their prefixed names (e.g., home.set_temperature for a tool from the “home” server).
4

Transparent Routing

When the agent loop invokes a tool, the client routes the call to the correct server automatically. The agent does not need to know which server hosts which tool.

Configuration

# In .env.local or environment
AGTOS_MCP_SERVERS='[
  {
    "name": "home",
    "url": "http://localhost:5000/mcp",
    "prefix": "home",
    "autoReconnect": true,
    "reconnectInterval": 30000
  },
  {
    "name": "files",
    "url": "http://localhost:5001/mcp",
    "prefix": "fs"
  }
]'
Or in the orchestrator configuration:
const orchestrator = new VoicePipelineOrchestrator({
  mcp: {
    servers: [
      {
        name: 'home',
        url: 'http://localhost:5000/mcp',
        prefix: 'home',
        autoReconnect: true,
        reconnectInterval: 30000,
      },
    ],
  },
});

Reconnection

When autoReconnect is enabled, the client automatically re-attempts connections at the configured interval when a server drops. Tool registrations are refreshed on reconnection, so newly added tools on the external server become available automatically.

Metrics

Every tool call through the MCP client records latency and errors via the global metrics collector. Health endpoints surface MCP client performance alongside other services.

Dynamic Tool Selection

As the number of available tools grows (especially with multiple external MCP servers), loading all tool schemas into the LLM’s context becomes a significant problem. Each tool definition consumes 550-1,400 tokens, and a deployment with 50+ tools can burn 25-50% of the context window on tool definitions alone. agtOS solves this with dynamic tool selection, which loads only the most relevant tools for each request.

The Problem

Tool CountToken CostContext Usage (200K window)
10 tools~10,000 tokens5%
50 tools~50,000 tokens25%
100 tools~100,000 tokens50%
200 tools~200,000 tokens100% (impossible)

The Solution

Before each LLM call, a lightweight routing step selects the relevant tools:
1

Embed the Query

Generate a dense vector embedding for the current user input.
2

Similarity Search

Find the top-K tools (default: 8) whose description embeddings are most similar to the query embedding using cosine similarity.
3

Threshold Filter

Only include tools above a minimum similarity threshold (default: 0.7).
4

Category Boost

If recent conversation context mentions specific categories, boost tools in those categories.
5

Schema Loading

Only the selected tools’ JSON schemas are included in the LLM request. All other tools exist in the registry but are invisible to the model.
This reduces tool token usage from potentially 100,000+ tokens to approximately 8,000 tokens (8 tools x ~1,000 tokens each) — an 80-90% reduction.
A small set of core tools (like discover_tools) are always included regardless of similarity score. If the agent needs a tool that was not selected, it can call discover_tools to search the registry and request specific tools for the next turn.

Tool Registry

The tool registry is an in-memory catalog of all available tools across the MCP server, MCP clients, and built-in capabilities. Each entry contains:
  • Tool name: Unique identifier (e.g., home.set_temperature)
  • Description: Natural language description for the LLM
  • Category tags: Semantic categories for routing (e.g., smart_home, climate)
  • Embedding: Dense vector for similarity search
  • Full schema: Complete JSON schema, stored but only loaded when selected
  • Source: Which MCP server provides this tool
Embeddings are generated at startup and refreshed when MCP servers report tool changes. The embedding + similarity search step adds approximately 10-20ms per request.

Integration Example

Here is a complete example showing agtOS as both an MCP server (receiving calls from Claude Desktop) and an MCP client (connecting to a smart home server):
Claude Desktop                    agtOS                     Smart Home MCP Server
     │                              │                              │
     │  POST /mcp                   │                              │
     │  tool: voice.speak           │                              │
     │  input: "Turn off lights"    │                              │
     │ ─────────────────────────▶   │                              │
     │                              │  Agent loop recognizes       │
     │                              │  smart home intent           │
     │                              │                              │
     │                              │  POST /mcp                   │
     │                              │  tool: home.lights_off       │
     │                              │ ─────────────────────────▶   │
     │                              │                              │
     │                              │  ◀───── { success: true }    │
     │                              │                              │
     │                              │  TTS: "The lights are off"   │
     │  ◀──── { audio: ... }        │                              │

Configuration Reference

# MCP Server
MCP_PORT=4100                      # Server port (default: 4100)

# MCP Client - external server connections
AGTOS_MCP_SERVERS='[...]'          # JSON array of server configs

# Dynamic tool selection
AGTOS_TOOL_SELECTION_TOP_K=8       # Max tools per request (default: 8)
AGTOS_TOOL_SELECTION_THRESHOLD=0.7 # Min similarity score (default: 0.7)

# Embedding provider (shared with semantic memory)
AGTOS_EMBEDDING_PROVIDER=ollama
AGTOS_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_URL=http://localhost:11434