Why ADRs?
agtOS went dormant from August 2025 to March 2026 — seven months during which the AI landscape shifted dramatically. MCP moved to the Linux Foundation, Piper TTS was archived, new voice architectures emerged, and local models became viable for agentic tool calling. ADRs ensure that:- Future sessions have context — when returning after any gap, ADRs explain why things are the way they are
- Decisions are traceable — every choice links to specific technical context and trade-offs
- Alternatives are documented — knowing what was not chosen (and why) is as valuable as knowing what was
- Onboarding is faster — new contributors can read the ADR index to understand the system’s evolution
ADR Index
| # | Title | Status | Date | Summary |
|---|---|---|---|---|
| 001 | Protocol-Agnostic Orchestration Gateway | Accepted | 2026-03-22 | Abstracts MCP, A2A, and future protocols behind a unified gateway interface. MCP is primary; others added via adapters. |
| 002 | TTS Provider Migration | Accepted | 2026-03-22 | Migrated from Piper (archived project) to speaches server with OpenAI-compatible API and Kokoro ONNX models. |
| 003 | Claude Dual-SDK Integration | Accepted | 2026-03-22 | Uses Client SDK for real-time voice streaming and Agent SDK for background autonomous tasks. Both share MCP infrastructure. |
| 004 | Hybrid Model Routing | Accepted | 2026-03-22 | Three-tier routing: intent classifier (local) -> Ollama (local) -> Claude (cloud). Optimizes cost, latency, and privacy. |
| 005 | MCP Transport Migration | Accepted | 2026-03-22 | Migrated from Server-Sent Events (SSE) to Streamable HTTP for MCP transport, aligning with MCP spec evolution. |
| 006 | Redis Client Selection | Accepted | 2026-03-22 | Chose node-redis over ioredis for the Redis client. Official Redis client with better TypeScript support. |
| 007 | Agent Memory Architecture | Accepted | 2026-03-22 | Protocol-based, vector-backed memory: working (session), episodic (Redis), semantic (Redis Vector Search + Ollama embeddings). |
| 008 | Native Audio Protocol Support | Accepted | 2026-03-22 | Supports three voice modes: CASCADE (STT->LLM->TTS), HALF_CASCADE (Ultravox), NATIVE (Gemini/GPT-4o Realtime). |
| 009 | Dynamic Toolset Loading | Accepted | 2026-03-22 | Intent-to-category mapping with top-N tool selection. Fixes MCP’s context window problem (72% consumed by tool schemas). |
| 010 | STT Provider Architecture | Accepted | 2026-03-22 | speaches server for STT via OpenAI-compatible API. Batch + streaming transcription with Faster Whisper models. |
| 011 | BYOK Credential Management | Accepted | 2026-03-22 | AES-256-GCM encrypted credential storage. Per-provider validation. Dual auth (API key + Max subscription). |
| 012 | WebSocket Audio Transport | Accepted | 2026-03-24 | WebSocket for MVP audio transport. Simpler than WebRTC, sufficient for same-network and Tailscale/VPN usage. |
| 013 | Web Dashboard Framework | Accepted | 2026-03-28 | React 19 + Vite 6 for the management UI. Accessibility-first (WCAG AA), responsive, keyboard navigable. |
| 014 | API Security | Accepted | 2026-03-28 | Opt-in Bearer token auth, token bucket rate limiting, Zod input validation on all POST endpoints. |
| 015 | Platform-Aware Adapter Routing | Accepted | 2026-03-29 | Platform-specific adapter overrides in the gateway. Tools can be restricted to specific platforms. Backward compatible. |
| 016 | Desktop Client Framework | Accepted | 2026-03-29 | Tauri 2 for native desktop. System tray, global PTT hotkey, health monitor. Node SEA sidecar for backend. |
The first 10 ADRs (001-010) were created together on 2026-03-22 to document decisions made upon resuming development after the 7-month hiatus. ADRs 011-016 were created individually as decisions arose during active development.
Key Architectural Principles
These principles emerge from the ADR collection and guide ongoing development:Protocol-First Design
Every integration is protocol-defined, not provider-specific. Protocols define interfaces; implementations are swappable. This applies to voice providers (ADR-002, ADR-010), LLM providers (ADR-003, ADR-004), tool integration (ADR-001, ADR-009), and audio architectures (ADR-008).Local-First Where Possible
The model router (ADR-004) routes simple queries to local Ollama models, reducing cost and latency while enabling offline operation. Privacy-sensitive requests never leave the local network. Cloud is reserved for complex reasoning that exceeds local capabilities.Infrastructure/Orchestration Separation
The dual-layer architecture ensures the orchestration layer does not know or care which voice pipeline mode is active (ADR-008). Whether using cascade STT->LLM->TTS, half-cascade audio LLMs, or native end-to-end models, the orchestration logic remains identical.Backward-Compatible Extension
New capabilities are added as optional extensions to existing interfaces. Platform-aware routing (ADR-015) adds a secondary lookup map without changing default behavior. Dynamic tool selection (ADR-009) filters tools before the LLM sees them without changing tool definitions.Security by Default
BYOK credential management (ADR-011) encrypts keys at rest. API security (ADR-014) uses timing-safe comparison and token bucket rate limiting. Device authentication uses per-device SHA-256 tokens. These are not bolt-on features — they were designed into the architecture from the start.ADR Deep Dives
ADR-001: Protocol-Agnostic Gateway
ADR-001: Protocol-Agnostic Gateway
Problem: MCP was the sole integration protocol, but the protocol landscape shifted. MCP joined the Linux Foundation alongside Google’s A2A. MCP has a context window problem (tool schemas consume 72% of 200K context). MCP does not address agent-to-agent coordination or frontend streaming.Decision: Build a gateway abstraction (
OrchestratorGateway interface) with protocol-specific adapters. MCP adapter is first and primary. A2A and AG-UI adapters can be added without modifying orchestration logic.Trade-off: Adds an abstraction layer that may be premature until a second protocol is needed. But the cost of the abstraction is low, and the cost of restructuring later is high.ADR-003: Claude Dual-SDK Integration
ADR-003: Claude Dual-SDK Integration
Problem: The voice pipeline has two fundamentally different interaction patterns — real-time conversation (streaming, low latency) and background tasks (multi-step, autonomous).Decision: Use both Anthropic SDKs. Client SDK (
@anthropic-ai/sdk) for voice path with streaming. Agent SDK (@anthropic-ai/claude-agent-sdk) for background tasks with agentic loops. Both connect to the same MCP servers.Trade-off: Two SDK integrations to maintain, two authentication flows. But optimizes each path for its requirements — voice gets minimum latency, background gets full agentic capability.ADR-004: Hybrid Model Routing
ADR-004: Hybrid Model Routing
Problem: Cloud API calls cost money, add latency, and send data off-device. Local models (Qwen3.5 27B) now score competitively on function calling benchmarks. Most voice interactions are simple enough for local models.Decision: Three-tier routing. Tier 1: intent classification via a micro-model (< 50ms). Tier 2: local Ollama for simple requests. Tier 3: Claude for complex reasoning. Automatic fallback between tiers.Trade-off: Routing complexity is higher than a single API call. Misclassification degrades experience. But cost savings are substantial for high-volume voice interactions.
ADR-007: Agent Memory Architecture
ADR-007: Agent Memory Architecture
Problem: True AI agents need memory that persists across sessions. Working memory (in-context) is insufficient for long-term recall.Decision: Three-tier memory architecture. Working memory: per-session conversation history with automatic LLM summarization. Episodic memory: cross-session recall via Redis with heuristic save decisions. Semantic memory: embedding-based vector search using Redis Vector Search and Ollama embeddings.Trade-off: Redis Vector Search is less capable than dedicated vector databases (Pinecone, Weaviate). But it avoids adding another infrastructure dependency — Redis is already required for sessions and scheduling.
ADR-008: Native Audio Protocol Support
ADR-008: Native Audio Protocol Support
Problem: Voice AI has evolved from cascade-only (STT->LLM->TTS) to three architectures with different cost/latency/quality profiles. Locking into cascade limits future options.Decision: Support all three modes through the infrastructure layer. CASCADE (default, ~500ms, ~1.50/min). The orchestration layer does not change.Trade-off: Three variants multiplied by multiple providers creates a large test matrix. But this validates the dual-layer architecture and positions agtOS for the native audio future.
ADR-009: Dynamic Toolset Loading
ADR-009: Dynamic Toolset Loading
Problem: MCP tool definitions consume 550-1,400 tokens each. With 20+ tools, 72% of a 200K context window is consumed before any conversation. This is MCP’s structural context window problem.Decision: Intent-to-category mapping with top-N tool selection. The intent classifier determines which tool categories are relevant, and only those tools are loaded into context. Achieves 80-90% context reduction.Trade-off: If the classifier picks the wrong category, the needed tool is not available. Mitigation: always include a “general” fallback category.
Proposing a New ADR
Create a file
Copy the template into
docs/adr/NNN-short-description.md using the next sequential number.Fill all sections
Context, Decision, and Consequences (positive, negative, risks). Include version numbers, benchmarks, and references.
ADR Template
Status Lifecycle
| Status | Meaning |
|---|---|
| Proposed | Under discussion, not yet adopted |
| Accepted | Active decision, reflects current architecture |
| Deprecated | No longer relevant (technology abandoned, feature removed) |
| Superseded by ADR-NNN | Replaced by a newer decision |
Full ADR text is available in the GitHub repository. Each ADR is self-contained — read only the ones relevant to your work.