Architecture Decision Records (ADRs) capture significant architectural decisions made during the development of agtOS. Each ADR describes a single decision, its context, the rationale behind it, and the consequences of adopting it.Documentation Index
Fetch the complete documentation index at: https://docs.agtos.ai/llms.txt
Use this file to discover all available pages before exploring further.
Why ADRs?
agtOS went dormant from August 2025 to March 2026 — seven months during which the AI landscape shifted dramatically. MCP moved to the Linux Foundation, Piper TTS was archived, new voice architectures emerged, and local models became viable for agentic tool calling. ADRs ensure that:- Future sessions have context — when returning after any gap, ADRs explain why things are the way they are
- Decisions are traceable — every choice links to specific technical context and trade-offs
- Alternatives are documented — knowing what was not chosen (and why) is as valuable as knowing what was
- Onboarding is faster — new contributors can read the ADR index to understand the system’s evolution
ADR Index
| # | Title | Status | Date | Summary |
|---|---|---|---|---|
| 001 | Protocol-Agnostic Orchestration Gateway | Accepted | 2026-03-22 | Abstracts MCP, A2A, and future protocols behind a unified gateway interface. MCP is primary; others added via adapters. |
| 002 | TTS Provider Migration | Accepted | 2026-03-22 | Migrated from Piper (archived project) to speaches server with OpenAI-compatible API and Kokoro ONNX models. |
| 003 | Claude Dual-SDK Integration | Accepted | 2026-03-22 | Uses Client SDK for real-time voice streaming and Agent SDK for background autonomous tasks. Both share MCP infrastructure. |
| 004 | Hybrid Model Routing | Accepted | 2026-03-22 | Three-tier routing: intent classifier (local) -> Ollama (local) -> Claude (cloud). Optimizes cost, latency, and privacy. |
| 005 | MCP Transport Migration | Accepted | 2026-03-22 | Migrated from Server-Sent Events (SSE) to Streamable HTTP for MCP transport, aligning with MCP spec evolution. |
| 006 | Redis Client Selection | Accepted | 2026-03-22 | Chose node-redis over ioredis for the Redis client. Official Redis client with better TypeScript support. |
| 007 | Agent Memory Architecture | Accepted | 2026-03-22 | Protocol-based, vector-backed memory: working (session), episodic (Redis), semantic (Redis Vector Search + Ollama embeddings). |
| 008 | Native Audio Protocol Support | Accepted | 2026-03-22 | Supports three voice modes: CASCADE (STT->LLM->TTS), HALF_CASCADE (Ultravox), NATIVE (Gemini/GPT-4o Realtime). |
| 009 | Dynamic Toolset Loading | Accepted | 2026-03-22 | Intent-to-category mapping with top-N tool selection. Fixes MCP’s context window problem (72% consumed by tool schemas). |
| 010 | STT Provider Architecture | Accepted | 2026-03-22 | speaches server for STT via OpenAI-compatible API. Batch + streaming transcription with Faster Whisper models. |
| 011 | BYOK Credential Management | Accepted | 2026-03-22 | AES-256-GCM encrypted credential storage with scrypt KDF and AAD binding. Per-provider validation. Setup token endpoint auth. Prometheus metrics. |
| 012 | WebSocket Audio Transport | Accepted | 2026-03-24 | WebSocket for MVP audio transport. Simpler than WebRTC, sufficient for same-network and Tailscale/VPN usage. |
| 013 | Web Dashboard Framework | Accepted | 2026-03-28 | React 19 + Vite 6 for the management UI. Accessibility-first (WCAG AA), responsive, keyboard navigable. |
| 014 | API Security | Accepted | 2026-03-28 | Opt-in Bearer token auth, token bucket rate limiting, Zod input validation on all POST endpoints. |
| 015 | Platform-Aware Adapter Routing | Accepted | 2026-03-29 | Platform-specific adapter overrides in the gateway. Tools can be restricted to specific platforms. Backward compatible. |
| 016 | Desktop Client Framework | Accepted | 2026-03-29 | Tauri 2 for native desktop. System tray, global PTT hotkey, health monitor. Node SEA sidecar for backend. |
| 017 | sherpa-onnx In-Process Speech Engine | Accepted | 2026-03-31 | Replace speaches Python sidecar with sherpa-onnx-node for in-process STT, TTS, and VAD via ONNX Runtime. |
| 018 | Cognitive Task Provider Architecture | Accepted | 2026-04-02 | Independent provider selection for embedding, classification, reasoning, consolidation, and summarization tasks. |
| 019 | OpenAI as Alternative Cloud Provider | Accepted | 2026-04-04 | OpenAI as drop-in cloud tier alternative to Claude. Configurable per slot via Model Slot Registry. Streaming, tool calls, session management via OpenAI SDK v6. |
| 020 | Model Slot Registry | Accepted | 2026-04-04 | Replaces two-tier LOCAL/CLOUD routing with named capability slots (chat, reasoning, coding, tool_calling, creative). Per-slot provider+model config with fallback chains. |
| 021 | Memory Enhancement — Resource-Aware Background Work, Maintenance Mode, Query-as-Ingest | Accepted | 2026-04-06 | Wires the Dreamer consolidation engine (was dead code), adds ResourceGuard to gate background LLM calls by active sessions / CPU load / Ollama VRAM, introduces a periodic maintenance sweep (“memory lint”), and persists high-quality synthesized responses via RESPONSE_INGEST episodes. |
| 022 | Upgrade to Zod 4 | Accepted | 2026-04-07 | [email protected] baseline. Removes all as never casts and @ts-ignore TS2589 suppressions from MCP tool registrations. z.record() now requires explicit key and value schemas. |
| 023 | Timezone-Aware Scheduler | Accepted | 2026-04-06 | CronSchedule.timezone (IANA) with croner@^10.0.1 for next-run computation. Replaces process-local time semantics. AGTOS_MAINTENANCE_TIMEZONE exposes the timezone to operators. |
| 024 | Atomic Profile Updates via Redis Pool + WATCH/MULTI/EXEC | Accepted | 2026-04-07 | UserProfileManager uses a node-redis v5 connection pool with optimistic locking (withOptimisticLock<T>()). All five mutating methods retry up to 3 times on WatchError. Closes audit M3 race window. |
| 025 | Multi-Tenant-Ready by Discipline | Accepted | 2026-04-08 | Single-user today at the operational layer, multi-tenant-ready at the data layer. Seven code-review rules: no hardcoded 'default', tenant-first Redis keys, userId server-resolved, no premature auth middleware. |
| 026 | ProviderCatalog Interface + OpenRouter First-Class | Accepted | 2026-04-08 | Cross-provider model discovery via listModels() / getAccountInfo() / validateModel(). OpenRouter promoted to a first-class provider with its own credential scope, ranking headers, and /api/v1/models catalog. Adds a maintenance task slot. |
| 027 | Memory Maintenance V2 — NLI Hybrid Contradiction Pipeline | Accepted | 2026-04-08 | 3-stage contradiction detection: candidate selection (cosine over embeddings) → local NLI cross-encoder (onnxruntime-node) + pair cache → batched LLM judge using the maintenance task slot. Supersedes the single-LLM detector from ADR-021 when dependencies are wired. |
| 028 | Personal AI Gateway Thesis | Accepted | 2026-04-10 | Formalizes agtOS as the identity + memory + routing layer between a person and every AI they touch. Prioritizes capture, entity-centric memory, and multi-surface support. |
| 029 | PACT Capture Protocol | Accepted | 2026-04-10 | Open protocol for streaming multimodal captures with presence signals, per-modality consent envelopes, and jurisdiction-aware metadata. Local-first-only v1. |
| 030 | Entity-Centric Memory | Accepted | 2026-04-10 | Redis JSON property graph with NER-extracted entities, relationships, and wiki UX on top of the existing 3-tier memory system. |
| 031 | Speaker Intelligence | Accepted | 2026-04-10 | sherpa-onnx speaker embedding extraction, diarization, and Redis persistence for multi-speaker attribution and active-participant verification. |
| 032 | Legal Compliance Framework | Accepted | 2026-04-10 | 7 design principles for capture legality under federal wiretap law, state two-party consent, EU AI Act, and neurorights bills. |
| 033 | Consumer Desktop Experience | Accepted | 2026-04-11 | Redesigns onboarding from 7-step developer wizard to 3-step consumer flow with just-in-time feature installation and hardware-aware model recommendations. |
| 034 | Three-Tier Health & Redis Hot-Connect | Accepted | 2026-04-12 | Health check prioritization (critical/important/optional) and hot-connect API to reconnect Redis services without restart. |
| 035 | Runtime Billing Exhaustion | Accepted | 2026-04-12 | Detects provider billing exhaustion at runtime with provider-specific error mapping and user-configured fallback strategies (cloud-backup/ollama-local/none). |
| 036 | Desktop Chat UI | Accepted | 2026-04-12 | SSE streaming text chat (POST /api/chat/stream) with rAF token batching, deferred markdown rendering, and provider-agnostic tool call display. |
| 037 | Provider-Agnostic Thinking & Vision | Accepted | 2026-04-12 | Unified thinking/reasoning output and image/vision input across Claude, OpenAI, Ollama, and OpenRouter with continuity token preservation. |
Key Architectural Principles
These principles emerge from the ADR collection and guide ongoing development:Protocol-First Design
Every integration is protocol-defined, not provider-specific. Protocols define interfaces; implementations are swappable. This applies to voice providers (ADR-002, ADR-010), LLM providers (ADR-003, ADR-004), tool integration (ADR-001, ADR-009), and audio architectures (ADR-008).Local-First Where Possible
The model router (ADR-004) routes simple queries to local Ollama models, reducing cost and latency while enabling offline operation. Privacy-sensitive requests never leave the local network. Cloud is reserved for complex reasoning that exceeds local capabilities.Infrastructure/Orchestration Separation
The dual-layer architecture ensures the orchestration layer does not know or care which voice pipeline mode is active (ADR-008). Whether using cascade STT->LLM->TTS, half-cascade audio LLMs, or native end-to-end models, the orchestration logic remains identical.Backward-Compatible Extension
New capabilities are added as optional extensions to existing interfaces. Platform-aware routing (ADR-015) adds a secondary lookup map without changing default behavior. Dynamic tool selection (ADR-009) filters tools before the LLM sees them without changing tool definitions.Security by Default
BYOK credential management (ADR-011) encrypts keys at rest with AES-256-GCM, scrypt key derivation, and AAD-bound ciphertext. Credential operations are instrumented with Prometheus metrics and structured correlation IDs. API security (ADR-014) uses timing-safe comparison and token bucket rate limiting. Device authentication uses per-device SHA-256 tokens. These are not bolt-on features — they were designed into the architecture from the start.ADR Deep Dives
ADR-001: Protocol-Agnostic Gateway
ADR-001: Protocol-Agnostic Gateway
OrchestratorGateway interface) with protocol-specific adapters. MCP adapter is first and primary. A2A and AG-UI adapters can be added without modifying orchestration logic.Trade-off: Adds an abstraction layer that may be premature until a second protocol is needed. But the cost of the abstraction is low, and the cost of restructuring later is high.ADR-003: Claude Dual-SDK Integration
ADR-003: Claude Dual-SDK Integration
@anthropic-ai/sdk) for voice path with streaming. Agent SDK (@anthropic-ai/claude-agent-sdk) for background tasks with agentic loops. Both connect to the same MCP servers.Trade-off: Two SDK integrations to maintain, two authentication flows. But optimizes each path for its requirements — voice gets minimum latency, background gets full agentic capability.ADR-004: Hybrid Model Routing
ADR-004: Hybrid Model Routing
ADR-007: Agent Memory Architecture
ADR-007: Agent Memory Architecture
ADR-008: Native Audio Protocol Support
ADR-008: Native Audio Protocol Support
ADR-009: Dynamic Toolset Loading
ADR-009: Dynamic Toolset Loading
ADR-015: Platform-Aware Adapter Routing
ADR-015: Platform-Aware Adapter Routing
ADR-017: sherpa-onnx In-Process Speech Engine
ADR-017: sherpa-onnx In-Process Speech Engine
ADR-018: Cognitive Task Provider Architecture
ADR-018: Cognitive Task Provider Architecture
ollama for local inference, claude for reasoning, openrouter for fallback). This is configured via environment variables and can be changed at runtime via PUT /api/settings.Trade-off: More configuration complexity and multiple API keys to manage. But enables fine-grained optimization — e.g., local Ollama for embeddings (cost-free) while routing complex reasoning to Claude.ADR-020: Model Slot Registry
ADR-020: Model Slot Registry
AGTOS_CLOUD_PROVIDER env var was a global switch that couldn’t express per-capability preferences.Decision: Replace the two-tier model with a Model Slot Registry. Each slot is a named capability position (chat, reasoning, coding, tool_calling, creative) that maps to a configured provider + model pair. The intent classifier routes requests to slot names, and the registry resolves slots to live provider instances. Task slots (embedding, classifier, summarization, consolidation, dialectic, maintenance) handle background cognitive tasks. Configuration lives in ~/.agtos/config.json under a slots key. Each slot supports fallback chains to handle provider failures.Trade-off: More configuration complexity — users must understand slots to customize routing. But the default agtos setup wizard configures the chat slot (required) and optional reasoning slot, which covers the common case. Power users get fine-grained control over which provider handles which kind of request.ADR-021: Memory Enhancement — Resource-Aware Background Work, Maintenance Mode, Query-as-Ingest
ADR-021: Memory Enhancement — Resource-Aware Background Work, Maintenance Mode, Query-as-Ingest
MemoryDreamer consolidation engine was implemented with 60+ tests but triggerConsolidation() was never called from anywhere in the codebase — user profiles were always empty. Even when wired, the Dreamer would compete with active voice sessions for local Ollama GPU/VRAM (Ollama queues requests when VRAM is full). And there was no mechanism to age out stale conclusions, detect contradictions, or prune redundant facts as the knowledge base grew.Decision: Five complementary changes that preserve the three-tier architecture (ADR-007):- Wire the Dreamer — call
triggerConsolidation()inendVoiceSession()plus a server-levelsessionEndedevent listener as defense-in-depth. - ResourceGuard — gate every background LLM call via a deterministic decision tree: policy override → cloud/remote short-circuit → active sessions → session cooldown → system load → Ollama VRAM probe (
GET /api/ps). Configurable policy (auto/always/idle-only). Windows caveat:os.loadavg()returns zeros, so VRAM probe is the only strong signal. - qmd MCP configuration + tool category inference —
AGTOS_MCP_SERVERSJSON env var for external MCP servers.inferCategory()maps tool name/description to intent categories so search tools auto-participate in dynamic tool selection (ADR-009). - Query-as-Ingest — a new
RESPONSE_INGESTepisode type. Heuristic scoring (tool calls, multi-step reasoning, length, synthesis patterns) decides when to persist a high-quality agent response so synthesis compounds across sessions. - Maintenance Mode —
Dreamer.maintain()runs a six-step sweep (stale detection, confidence decay, redundancy merge, orphan flagging, LLM contradiction check, low-confidence prune). Auto-registered cron task (default0 3 * * *), POST/api/memory/maintainfor on-demand runs,memory-maintenancehealth check flags runs older than 48 hours.
MemoryCoordinator, and maintenance LLM calls cost tokens on cloud providers. Mitigated by per-sweep caps, the cron schedule, and ResourceGuard’s skip-on-load. The contradiction detection portion of Step 5 is superseded by ADR-027’s NLI hybrid pipeline when its dependencies are wired — ADR-021 still owns ResourceGuard, the cron schedule, the kill switch, dangling-source detection, and the maintenance health check.ADR-025: Multi-Tenant-Ready by Discipline
ADR-025: Multi-Tenant-Ready by Discipline
'default' literal and migrating Redis keys — painful and bug-prone. Industry postmortems (Kestra 2024, WorkOS guidance) converge on: “the data layer must be multi-tenant-aware from day one, even if the operational layer is not.”Decision: Adopt seven code-review rules enforced by discipline rather than infrastructure:- Never hardcode
'default'as a userId — useresolveUserId(config)from@/core/types. - Tenant-first Redis keys:
agtos:{subsystem}:{userId}:{entity}for all user-scoped data. - No optional
userIdin client-facing API request schemas; the server derives it fromprofileManager.getDefaultUserId(). Schemas are.strict(). - No premature auth middleware — the single global
AGTOS_API_KEYis correct for single-user. The only acceptable infrastructure istype UserId = stringandresolveUserId(config). - Document the contract in any new ADR that introduces persistence.
- Every vector search must accept a
userIdparameter. - Background jobs carry
userIdat task creation time, not at execution time.
resolveUserId() and the credentials store change. All other code stays identical because the data layer is already correct.ADR-026: ProviderCatalog Interface + OpenRouter First-Class
ADR-026: ProviderCatalog Interface + OpenRouter First-Class
agtos setup wizard, and the CLI all need to discover what models are available from each provider — but every provider SDK has a different shape. Anthropic exposes rich ModelInfo with capability flags; OpenAI exposes only id and owned_by; Ollama needs list + show fan-out; OpenRouter has its own /api/v1/models endpoint with string-encoded pricing. Meanwhile, OpenRouter was being treated as “OpenAI with a different baseURL”, which caused credential scope confusion, lost access to rich catalog metadata, and mis-layered ranking headers (HTTP-Referer, X-Title).Decision: Define a ProviderCatalog interface at src/core/providers/catalog/types.ts with listModels() (returns Result<ModelInfo[]>), optional getAccountInfo(), and optional validateModel(). ModelInfo carries a 13-entry finite capability union (including 'contradiction' for KB maintenance), context length, pricing per 1M tokens, and provider-specific fields (sizeBytes for Ollama, upstreamProvider for OpenRouter). Four implementations ship: OpenRouterCatalog, OllamaCatalog, ClaudeCatalog, OpenAICatalog, all with 1-hour TTL caching. OpenRouter is promoted to a first-class provider under src/providers/openrouter/ with its own credential scope (provider-openrouter), a client-provider (OpenAI SDK with baseURL override plus attribution headers), and a new maintenance task slot (6th entry in TASK_SLOTS) for the Stage 3 LLM judge in the NLI hybrid pipeline.Trade-off: Four catalog implementations to maintain plus a capability map for OpenAI (whose API doesn’t expose capabilities). But the dashboard, setup wizard, and NLI hybrid pipeline (ADR-027) all get a single provider-agnostic query path, and OpenRouter finally gets its own credential scope and pricing data.ADR-027: Memory Maintenance V2 — NLI Hybrid Contradiction Pipeline
ADR-027: Memory Maintenance V2 — NLI Hybrid Contradiction Pipeline
Dreamer.maintain() Step 5 when its dependencies are wired:- Stage 1 — Candidate selection:
CandidateSelectorcomputes an in-memory cosine over conclusion embeddings, caches by id + textHash, selects top-K nearest plus an “interesting pair” priority heuristic, deduplicates, and truncates to 500 pairs. - Stage 2 — NLI cross-encoder + pair cache:
NliClassifierrunsonnxruntime-node@^1.24.3on a CPU session with a quantized DeBERTa-v3-base MNLI model (223 MB, SHA-256 pinned, atomic-rename download).PairCacheuses RedisHEXPIREon 7.4+ with STRING+EX fallback, content-addressed pair keys, and a secondary index for O(k) invalidation. - Stage 3 — Batched LLM judge:
LlmJudgesends 10 pairs per call with a structured-JSON Zod-validated prompt. Per-batch isolation means a single failure logs a warning and contributes zero without aborting the sweep. Uses the hot-swappablemaintenancetask slot from ADR-026 so users can pin a cheaper / faster model for the judge.
AGTOS_NLI_ENABLED=false; legacy callers fall through to the V1 single-LLM path. A new memory.contradiction.detected event fires once per confirmation. MaintenanceReport.summary.contradictionPipeline carries per-stage counters and latencies.Trade-off: 223 MB model file to ship (prebuild via npm run prebuild:nli) and additional CPU work on sweep days. But recall goes up, token cost goes down, and each stage can be upgraded independently. Backwards compatibility is preserved: the existing 77 dreamer tests run unchanged against the V1 fall-through path.ADR-028: Personal AI Gateway Thesis
ADR-028: Personal AI Gateway Thesis
ADR-029: PACT Capture Protocol
ADR-029: PACT Capture Protocol
ADR-030: Entity-Centric Memory
ADR-030: Entity-Centric Memory
@huggingface/transformers (bert-base-NER), entity deduplication via alias matching and embedding similarity (0.85 threshold), and a wiki UX for browsing/editing entities and relationships.Trade-off: Storage overhead (~2 MB at 1000 entities) and NER accuracy ceiling (general-purpose model will miss domain-specific entities) versus structured entity queries and relationship traversal.ADR-031: Speaker Intelligence
ADR-031: Speaker Intelligence
ADR-032: Legal Compliance Framework
ADR-032: Legal Compliance Framework
ADR-033: Consumer Desktop Experience
ADR-033: Consumer Desktop Experience
ADR-034: Three-Tier Health & Redis Hot-Connect
ADR-034: Three-Tier Health & Redis Hot-Connect
priority field to health checks (critical: ollama/sherpa/provider; important: cloud-providers/redis/mcp; optional: capture/memory/nli), return 200 only when critical services are healthy. Add POST /api/system/reconnect-redis to lazily create and connect Redis-dependent services on demand without restart.Trade-off: Fresh install shows green status even with optional capture unavailable versus slightly more complex ServerContext with mutable state indirection.ADR-035: Runtime Billing Exhaustion
ADR-035: Runtime Billing Exhaustion
billing_error, OpenAI insufficient_quota, OpenRouter 402).Decision: Create provider-specific mapApiError() to classify billing errors into PROVIDER_BILLING_EXHAUSTED, centralize detection in BillingDetector that marks all slots for that provider unhealthy simultaneously, and enable user-configured fallback strategy (cloud-backup/ollama-local/none).Trade-off: OpenAI SDK wastes 2 retries before detection versus proactive balance monitoring and cross-provider fallback chains.ADR-036: Desktop Chat UI
ADR-036: Desktop Chat UI
POST /api/chat/stream SSE endpoint with fetch + ReadableStream, accumulate token deltas in rAF buffer before flushing to React state, and defer markdown AST construction until streaming completes. Includes thinking/reasoning block display and tool call visualization.Trade-off: SSE via POST requires manual stream parsing (no native EventSource support for POST) versus real-time streaming feedback in the dashboard.ADR-037: Provider-Agnostic Thinking & Vision
ADR-037: Provider-Agnostic Thinking & Vision
onThinking callback to AgentStreamCallbacks with per-provider extraction (Claude stream.on, Ollama think field, OpenRouter message.reasoning, OpenAI Responses API reasoning_summary_text). Preserve continuity tokens in Message type. Base64-encode images in request body with per-provider formatting.Trade-off: Per-provider extraction logic for incompatible thinking formats versus unified user experience with thinking visibility and multi-turn reasoning continuity.Proposing a New ADR
Create a file
docs/adr/NNN-short-description.md using the next sequential number.Fill all sections
ADR Template
Status Lifecycle
| Status | Meaning |
|---|---|
| Proposed | Under discussion, not yet adopted |
| Accepted | Active decision, reflects current architecture |
| Deprecated | No longer relevant (technology abandoned, feature removed) |
| Superseded by ADR-NNN | Replaced by a newer decision |