Skip to main content
Architecture Decision Records (ADRs) capture significant architectural decisions made during the development of agtOS. Each ADR describes a single decision, its context, the rationale behind it, and the consequences of adopting it.

Why ADRs?

agtOS went dormant from August 2025 to March 2026 — seven months during which the AI landscape shifted dramatically. MCP moved to the Linux Foundation, Piper TTS was archived, new voice architectures emerged, and local models became viable for agentic tool calling. ADRs ensure that:
  • Future sessions have context — when returning after any gap, ADRs explain why things are the way they are
  • Decisions are traceable — every choice links to specific technical context and trade-offs
  • Alternatives are documented — knowing what was not chosen (and why) is as valuable as knowing what was
  • Onboarding is faster — new contributors can read the ADR index to understand the system’s evolution
ADRs are immutable once accepted. If a decision is reversed or significantly modified, the original ADR is marked as deprecated and a new ADR references it. This preserves the historical reasoning behind every decision.

ADR Index

#TitleStatusDateSummary
001Protocol-Agnostic Orchestration GatewayAccepted2026-03-22Abstracts MCP, A2A, and future protocols behind a unified gateway interface. MCP is primary; others added via adapters.
002TTS Provider MigrationAccepted2026-03-22Migrated from Piper (archived project) to speaches server with OpenAI-compatible API and Kokoro ONNX models.
003Claude Dual-SDK IntegrationAccepted2026-03-22Uses Client SDK for real-time voice streaming and Agent SDK for background autonomous tasks. Both share MCP infrastructure.
004Hybrid Model RoutingAccepted2026-03-22Three-tier routing: intent classifier (local) -> Ollama (local) -> Claude (cloud). Optimizes cost, latency, and privacy.
005MCP Transport MigrationAccepted2026-03-22Migrated from Server-Sent Events (SSE) to Streamable HTTP for MCP transport, aligning with MCP spec evolution.
006Redis Client SelectionAccepted2026-03-22Chose node-redis over ioredis for the Redis client. Official Redis client with better TypeScript support.
007Agent Memory ArchitectureAccepted2026-03-22Protocol-based, vector-backed memory: working (session), episodic (Redis), semantic (Redis Vector Search + Ollama embeddings).
008Native Audio Protocol SupportAccepted2026-03-22Supports three voice modes: CASCADE (STT->LLM->TTS), HALF_CASCADE (Ultravox), NATIVE (Gemini/GPT-4o Realtime).
009Dynamic Toolset LoadingAccepted2026-03-22Intent-to-category mapping with top-N tool selection. Fixes MCP’s context window problem (72% consumed by tool schemas).
010STT Provider ArchitectureAccepted2026-03-22speaches server for STT via OpenAI-compatible API. Batch + streaming transcription with Faster Whisper models.
011BYOK Credential ManagementAccepted2026-03-22AES-256-GCM encrypted credential storage. Per-provider validation. Dual auth (API key + Max subscription).
012WebSocket Audio TransportAccepted2026-03-24WebSocket for MVP audio transport. Simpler than WebRTC, sufficient for same-network and Tailscale/VPN usage.
013Web Dashboard FrameworkAccepted2026-03-28React 19 + Vite 6 for the management UI. Accessibility-first (WCAG AA), responsive, keyboard navigable.
014API SecurityAccepted2026-03-28Opt-in Bearer token auth, token bucket rate limiting, Zod input validation on all POST endpoints.
015Platform-Aware Adapter RoutingAccepted2026-03-29Platform-specific adapter overrides in the gateway. Tools can be restricted to specific platforms. Backward compatible.
016Desktop Client FrameworkAccepted2026-03-29Tauri 2 for native desktop. System tray, global PTT hotkey, health monitor. Node SEA sidecar for backend.
The first 10 ADRs (001-010) were created together on 2026-03-22 to document decisions made upon resuming development after the 7-month hiatus. ADRs 011-016 were created individually as decisions arose during active development.

Key Architectural Principles

These principles emerge from the ADR collection and guide ongoing development:

Protocol-First Design

Every integration is protocol-defined, not provider-specific. Protocols define interfaces; implementations are swappable. This applies to voice providers (ADR-002, ADR-010), LLM providers (ADR-003, ADR-004), tool integration (ADR-001, ADR-009), and audio architectures (ADR-008).

Local-First Where Possible

The model router (ADR-004) routes simple queries to local Ollama models, reducing cost and latency while enabling offline operation. Privacy-sensitive requests never leave the local network. Cloud is reserved for complex reasoning that exceeds local capabilities.

Infrastructure/Orchestration Separation

The dual-layer architecture ensures the orchestration layer does not know or care which voice pipeline mode is active (ADR-008). Whether using cascade STT->LLM->TTS, half-cascade audio LLMs, or native end-to-end models, the orchestration logic remains identical.

Backward-Compatible Extension

New capabilities are added as optional extensions to existing interfaces. Platform-aware routing (ADR-015) adds a secondary lookup map without changing default behavior. Dynamic tool selection (ADR-009) filters tools before the LLM sees them without changing tool definitions.

Security by Default

BYOK credential management (ADR-011) encrypts keys at rest. API security (ADR-014) uses timing-safe comparison and token bucket rate limiting. Device authentication uses per-device SHA-256 tokens. These are not bolt-on features — they were designed into the architecture from the start.

ADR Deep Dives

Problem: MCP was the sole integration protocol, but the protocol landscape shifted. MCP joined the Linux Foundation alongside Google’s A2A. MCP has a context window problem (tool schemas consume 72% of 200K context). MCP does not address agent-to-agent coordination or frontend streaming.Decision: Build a gateway abstraction (OrchestratorGateway interface) with protocol-specific adapters. MCP adapter is first and primary. A2A and AG-UI adapters can be added without modifying orchestration logic.Trade-off: Adds an abstraction layer that may be premature until a second protocol is needed. But the cost of the abstraction is low, and the cost of restructuring later is high.
Problem: The voice pipeline has two fundamentally different interaction patterns — real-time conversation (streaming, low latency) and background tasks (multi-step, autonomous).Decision: Use both Anthropic SDKs. Client SDK (@anthropic-ai/sdk) for voice path with streaming. Agent SDK (@anthropic-ai/claude-agent-sdk) for background tasks with agentic loops. Both connect to the same MCP servers.Trade-off: Two SDK integrations to maintain, two authentication flows. But optimizes each path for its requirements — voice gets minimum latency, background gets full agentic capability.
Problem: Cloud API calls cost money, add latency, and send data off-device. Local models (Qwen3.5 27B) now score competitively on function calling benchmarks. Most voice interactions are simple enough for local models.Decision: Three-tier routing. Tier 1: intent classification via a micro-model (< 50ms). Tier 2: local Ollama for simple requests. Tier 3: Claude for complex reasoning. Automatic fallback between tiers.Trade-off: Routing complexity is higher than a single API call. Misclassification degrades experience. But cost savings are substantial for high-volume voice interactions.
Problem: True AI agents need memory that persists across sessions. Working memory (in-context) is insufficient for long-term recall.Decision: Three-tier memory architecture. Working memory: per-session conversation history with automatic LLM summarization. Episodic memory: cross-session recall via Redis with heuristic save decisions. Semantic memory: embedding-based vector search using Redis Vector Search and Ollama embeddings.Trade-off: Redis Vector Search is less capable than dedicated vector databases (Pinecone, Weaviate). But it avoids adding another infrastructure dependency — Redis is already required for sessions and scheduling.
Problem: Voice AI has evolved from cascade-only (STT->LLM->TTS) to three architectures with different cost/latency/quality profiles. Locking into cascade limits future options.Decision: Support all three modes through the infrastructure layer. CASCADE (default, ~500ms, ~0.15/min),HALFCASCADE(Ultravox, 300ms),NATIVE(GeminiLive/OpenAIRealtime, 200ms, 0.15/min), HALF_CASCADE (Ultravox, ~300ms), NATIVE (Gemini Live / OpenAI Realtime, ~200ms, ~1.50/min). The orchestration layer does not change.Trade-off: Three variants multiplied by multiple providers creates a large test matrix. But this validates the dual-layer architecture and positions agtOS for the native audio future.
Problem: MCP tool definitions consume 550-1,400 tokens each. With 20+ tools, 72% of a 200K context window is consumed before any conversation. This is MCP’s structural context window problem.Decision: Intent-to-category mapping with top-N tool selection. The intent classifier determines which tool categories are relevant, and only those tools are loaded into context. Achieves 80-90% context reduction.Trade-off: If the classifier picks the wrong category, the needed tool is not available. Mitigation: always include a “general” fallback category.

Proposing a New ADR

1

Create a file

Copy the template into docs/adr/NNN-short-description.md using the next sequential number.
2

Fill all sections

Context, Decision, and Consequences (positive, negative, risks). Include version numbers, benchmarks, and references.
3

Set status to Proposed

Until reviewed and accepted.
4

Add to the index

Update the table in docs/adr/README.md.
5

Link to a GitHub issue

Every ADR should relate to a tracked issue.

ADR Template

# ADR-NNN: Title

**Date**: YYYY-MM-DD
**Status**: Proposed | Accepted | Deprecated | Superseded by ADR-NNN
**Relates to**: [GitHub Issue #NN] | [ADR-NNN]

## Context
What technical context motivates this decision?
Include version numbers, benchmarks, ecosystem changes.

## Decision
What are we choosing, and what are we explicitly not choosing?

## Consequences

### Positive
- What becomes easier or better?

### Negative
- What becomes harder or worse?

### Risks
- What could go wrong? What assumptions might not hold?

Status Lifecycle

StatusMeaning
ProposedUnder discussion, not yet adopted
AcceptedActive decision, reflects current architecture
DeprecatedNo longer relevant (technology abandoned, feature removed)
Superseded by ADR-NNNReplaced by a newer decision
Full ADR text is available in the GitHub repository. Each ADR is self-contained — read only the ones relevant to your work.