Architecture Decision Records

Architecture Decision Records (ADRs) capture significant architectural decisions made during the development of agtOS. Each ADR describes a single decision, its context, the rationale behind it, and the consequences of adopting it.

Why ADRs?

agtOS went dormant from August 2025 to March 2026 — seven months during which the AI landscape shifted dramatically. MCP moved to the Linux Foundation, Piper TTS was archived, new voice architectures emerged, and local models became viable for agentic tool calling. ADRs ensure that:

Future sessions have context — when returning after any gap, ADRs explain why things are the way they are
Decisions are traceable — every choice links to specific technical context and trade-offs
Alternatives are documented — knowing what was not chosen (and why) is as valuable as knowing what was
Onboarding is faster — new contributors can read the ADR index to understand the system’s evolution

ADRs are immutable once accepted. If a decision is reversed or significantly modified, the original ADR is marked as deprecated and a new ADR references it. This preserves the historical reasoning behind every decision.

ADR Index

#	Title	Status	Date	Summary
001	Protocol-Agnostic Orchestration Gateway	Accepted	2026-03-22	Abstracts MCP, A2A, and future protocols behind a unified gateway interface. MCP is primary; others added via adapters.
002	TTS Provider Migration	Accepted	2026-03-22	Migrated from Piper (archived project) to speaches server with OpenAI-compatible API and Kokoro ONNX models.
003	Claude Dual-SDK Integration	Accepted	2026-03-22	Uses Client SDK for real-time voice streaming and Agent SDK for background autonomous tasks. Both share MCP infrastructure.
004	Hybrid Model Routing	Accepted	2026-03-22	Three-tier routing: intent classifier (local) -> Ollama (local) -> Claude (cloud). Optimizes cost, latency, and privacy.
005	MCP Transport Migration	Accepted	2026-03-22	Migrated from Server-Sent Events (SSE) to Streamable HTTP for MCP transport, aligning with MCP spec evolution.
006	Redis Client Selection	Accepted	2026-03-22	Chose node-redis over ioredis for the Redis client. Official Redis client with better TypeScript support.
007	Agent Memory Architecture	Accepted	2026-03-22	Protocol-based, vector-backed memory: working (session), episodic (Redis), semantic (Redis Vector Search + Ollama embeddings).
008	Native Audio Protocol Support	Accepted	2026-03-22	Supports three voice modes: CASCADE (STT->LLM->TTS), HALF_CASCADE (Ultravox), NATIVE (Gemini/GPT-4o Realtime).
009	Dynamic Toolset Loading	Accepted	2026-03-22	Intent-to-category mapping with top-N tool selection. Fixes MCP’s context window problem (72% consumed by tool schemas).
010	STT Provider Architecture	Accepted	2026-03-22	speaches server for STT via OpenAI-compatible API. Batch + streaming transcription with Faster Whisper models.
011	BYOK Credential Management	Accepted	2026-03-22	AES-256-GCM encrypted credential storage with scrypt KDF and AAD binding. Per-provider validation. Setup token endpoint auth. Prometheus metrics.
012	WebSocket Audio Transport	Accepted	2026-03-24	WebSocket for MVP audio transport. Simpler than WebRTC, sufficient for same-network and Tailscale/VPN usage.
013	Web Dashboard Framework	Accepted	2026-03-28	React 19 + Vite 6 for the management UI. Accessibility-first (WCAG AA), responsive, keyboard navigable.
014	API Security	Accepted	2026-03-28	Opt-in Bearer token auth, token bucket rate limiting, Zod input validation on all POST endpoints.
015	Platform-Aware Adapter Routing	Accepted	2026-03-29	Platform-specific adapter overrides in the gateway. Tools can be restricted to specific platforms. Backward compatible.
016	Desktop Client Framework	Accepted	2026-03-29	Tauri 2 for native desktop. System tray, global PTT hotkey, health monitor. Node SEA sidecar for backend.
017	sherpa-onnx In-Process Speech Engine	Accepted	2026-03-31	Replace speaches Python sidecar with sherpa-onnx-node for in-process STT, TTS, and VAD via ONNX Runtime.
018	Cognitive Task Provider Architecture	Accepted	2026-04-02	Independent provider selection for embedding, classification, reasoning, consolidation, and summarization tasks.
019	OpenAI as Alternative Cloud Provider	Accepted	2026-04-04	OpenAI as drop-in cloud tier alternative to Claude. Configurable per slot via Model Slot Registry. Streaming, tool calls, session management via OpenAI SDK v6.
020	Model Slot Registry	Accepted	2026-04-04	Replaces two-tier LOCAL/CLOUD routing with named capability slots (`chat`, `reasoning`, `coding`, `tool_calling`, `creative`). Per-slot provider+model config with fallback chains.
021	Memory Enhancement — Resource-Aware Background Work, Maintenance Mode, Query-as-Ingest	Accepted	2026-04-06	Wires the Dreamer consolidation engine (was dead code), adds ResourceGuard to gate background LLM calls by active sessions / CPU load / Ollama VRAM, introduces a periodic maintenance sweep (“memory lint”), and persists high-quality synthesized responses via `RESPONSE_INGEST` episodes.
022	Upgrade to Zod 4	Accepted	2026-04-07	`[email protected]` baseline. Removes all `as never` casts and `@ts-ignore TS2589` suppressions from MCP tool registrations. `z.record()` now requires explicit key and value schemas.
023	Timezone-Aware Scheduler	Accepted	2026-04-06	`CronSchedule.timezone` (IANA) with `croner@^10.0.1` for next-run computation. Replaces process-local time semantics. `AGTOS_MAINTENANCE_TIMEZONE` exposes the timezone to operators.
024	Atomic Profile Updates via Redis Pool + WATCH/MULTI/EXEC	Accepted	2026-04-07	`UserProfileManager` uses a node-redis v5 connection pool with optimistic locking (`withOptimisticLock<T>()`). All five mutating methods retry up to 3 times on `WatchError`. Closes audit M3 race window.
025	Multi-Tenant-Ready by Discipline	Accepted	2026-04-08	Single-user today at the operational layer, multi-tenant-ready at the data layer. Seven code-review rules: no hardcoded `'default'`, tenant-first Redis keys, `userId` server-resolved, no premature auth middleware.
026	ProviderCatalog Interface + OpenRouter First-Class	Accepted	2026-04-08	Cross-provider model discovery via `listModels()` / `getAccountInfo()` / `validateModel()`. OpenRouter promoted to a first-class provider with its own credential scope, ranking headers, and `/api/v1/models` catalog. Adds a `maintenance` task slot.
027	Memory Maintenance V2 — NLI Hybrid Contradiction Pipeline	Accepted	2026-04-08	3-stage contradiction detection: candidate selection (cosine over embeddings) → local NLI cross-encoder (`onnxruntime-node`) + pair cache → batched LLM judge using the `maintenance` task slot. Supersedes the single-LLM detector from ADR-021 when dependencies are wired.
028	Personal AI Gateway Thesis	Accepted	2026-04-10	Formalizes agtOS as the identity + memory + routing layer between a person and every AI they touch. Prioritizes capture, entity-centric memory, and multi-surface support.
029	PACT Capture Protocol	Accepted	2026-04-10	Open protocol for streaming multimodal captures with presence signals, per-modality consent envelopes, and jurisdiction-aware metadata. Local-first-only v1.
030	Entity-Centric Memory	Accepted	2026-04-10	Redis JSON property graph with NER-extracted entities, relationships, and wiki UX on top of the existing 3-tier memory system.
031	Speaker Intelligence	Accepted	2026-04-10	sherpa-onnx speaker embedding extraction, diarization, and Redis persistence for multi-speaker attribution and active-participant verification.
032	Legal Compliance Framework	Accepted	2026-04-10	7 design principles for capture legality under federal wiretap law, state two-party consent, EU AI Act, and neurorights bills.
033	Consumer Desktop Experience	Accepted	2026-04-11	Redesigns onboarding from 7-step developer wizard to 3-step consumer flow with just-in-time feature installation and hardware-aware model recommendations.
034	Three-Tier Health & Redis Hot-Connect	Accepted	2026-04-12	Health check prioritization (critical/important/optional) and hot-connect API to reconnect Redis services without restart.
035	Runtime Billing Exhaustion	Accepted	2026-04-12	Detects provider billing exhaustion at runtime with provider-specific error mapping and user-configured fallback strategies (cloud-backup/ollama-local/none).
036	Desktop Chat UI	Accepted	2026-04-12	SSE streaming text chat (`POST /api/chat/stream`) with rAF token batching, deferred markdown rendering, and provider-agnostic tool call display.
037	Provider-Agnostic Thinking & Vision	Accepted	2026-04-12	Unified thinking/reasoning output and image/vision input across Claude, OpenAI, Ollama, and OpenRouter with continuity token preservation.

The first 10 ADRs (001-010) were created together on 2026-03-22 to document decisions made upon resuming development after the 7-month hiatus. ADRs 011-020 were created individually during active development. ADRs 021-027 (2026-04-06 through 2026-04-08) cover the Memory Enhancement, Zod 4, timezone-aware scheduling, atomic profile updates, multi-tenant readiness, provider catalog, and NLI hybrid contradiction pipeline work. ADRs 028-037 (2026-04-10 through 2026-04-12) cover the Personal AI Gateway thesis, PACT capture protocol, entity-centric memory, speaker intelligence, legal compliance, consumer desktop experience, health tiers, billing exhaustion, desktop chat UI, and provider-agnostic thinking/vision.

Key Architectural Principles

These principles emerge from the ADR collection and guide ongoing development:

Protocol-First Design

Every integration is protocol-defined, not provider-specific. Protocols define interfaces; implementations are swappable. This applies to voice providers (ADR-002, ADR-010), LLM providers (ADR-003, ADR-004), tool integration (ADR-001, ADR-009), and audio architectures (ADR-008).

Local-First Where Possible

The model router (ADR-004) routes simple queries to local Ollama models, reducing cost and latency while enabling offline operation. Privacy-sensitive requests never leave the local network. Cloud is reserved for complex reasoning that exceeds local capabilities.

Infrastructure/Orchestration Separation

The dual-layer architecture ensures the orchestration layer does not know or care which voice pipeline mode is active (ADR-008). Whether using cascade STT->LLM->TTS, half-cascade audio LLMs, or native end-to-end models, the orchestration logic remains identical.

Backward-Compatible Extension

New capabilities are added as optional extensions to existing interfaces. Platform-aware routing (ADR-015) adds a secondary lookup map without changing default behavior. Dynamic tool selection (ADR-009) filters tools before the LLM sees them without changing tool definitions.

Security by Default

BYOK credential management (ADR-011) encrypts keys at rest with AES-256-GCM, scrypt key derivation, and AAD-bound ciphertext. Credential operations are instrumented with Prometheus metrics and structured correlation IDs. API security (ADR-014) uses timing-safe comparison and token bucket rate limiting. Device authentication uses per-device SHA-256 tokens. These are not bolt-on features — they were designed into the architecture from the start.

ADR Deep Dives

ADR-001: Protocol-Agnostic Gateway

Problem: MCP was the sole integration protocol, but the protocol landscape shifted. MCP joined the Linux Foundation alongside Google’s A2A. MCP has a context window problem (tool schemas consume 72% of 200K context). MCP does not address agent-to-agent coordination or frontend streaming.Decision: Build a gateway abstraction (OrchestratorGateway interface) with protocol-specific adapters. MCP adapter is first and primary. A2A and AG-UI adapters can be added without modifying orchestration logic.Trade-off: Adds an abstraction layer that may be premature until a second protocol is needed. But the cost of the abstraction is low, and the cost of restructuring later is high.

ADR-003: Claude Dual-SDK Integration

Problem: The voice pipeline has two fundamentally different interaction patterns — real-time conversation (streaming, low latency) and background tasks (multi-step, autonomous).Decision: Use both Anthropic SDKs. Client SDK (@anthropic-ai/sdk) for voice path with streaming. Agent SDK (@anthropic-ai/claude-agent-sdk) for background tasks with agentic loops. Both connect to the same MCP servers.Trade-off: Two SDK integrations to maintain, two authentication flows. But optimizes each path for its requirements — voice gets minimum latency, background gets full agentic capability.

ADR-004: Hybrid Model Routing

Problem: Cloud API calls cost money, add latency, and send data off-device. Local models (Qwen3.5 27B) now score competitively on function calling benchmarks. Most voice interactions are simple enough for local models.Decision: Three-tier routing. Tier 1: intent classification via a micro-model (under 50ms). Tier 2: local Ollama for simple requests. Tier 3: Claude for complex reasoning. Automatic fallback between tiers.Trade-off: Routing complexity is higher than a single API call. Misclassification degrades experience. But cost savings are substantial for high-volume voice interactions.

ADR-007: Agent Memory Architecture

Problem: True AI agents need memory that persists across sessions. Working memory (in-context) is insufficient for long-term recall.Decision: Three-tier memory architecture. Working memory: per-session conversation history with automatic LLM summarization. Episodic memory: cross-session recall via Redis with heuristic save decisions. Semantic memory: embedding-based vector search using Redis Vector Search and Ollama embeddings.Trade-off: Redis Vector Search is less capable than dedicated vector databases (Pinecone, Weaviate). But it avoids adding another infrastructure dependency — Redis is already required for sessions and scheduling.

ADR-008: Native Audio Protocol Support

Problem: Voice AI has evolved from cascade-only (STT->LLM->TTS) to three architectures with different cost/latency/quality profiles. Locking into cascade limits future options.Decision: Support all three modes through the infrastructure layer. CASCADE (default, ~500ms, ~

0.15/min), HALF_CASCADE (Ultravox, ~300ms), NATIVE (Gemini Live / OpenAI Realtime, ~200ms, ~

1.50/min). The orchestration layer does not change.Trade-off: Three variants multiplied by multiple providers creates a large test matrix. But this validates the dual-layer architecture and positions agtOS for the native audio future.

ADR-009: Dynamic Toolset Loading

Problem: MCP tool definitions consume 550-1,400 tokens each. With 20+ tools, 72% of a 200K context window is consumed before any conversation. This is MCP’s structural context window problem.Decision: Intent-to-category mapping with top-N tool selection. The intent classifier determines which tool categories are relevant, and only those tools are loaded into context. Achieves 80-90% context reduction.Trade-off: If the classifier picks the wrong category, the needed tool is not available. Mitigation: always include a “general” fallback category.

ADR-015: Platform-Aware Adapter Routing

Problem: agtOS runs across multiple platforms (Node.js server, desktop, ESP32), but some tools are only available on specific platforms. There was no way to restrict tool availability per platform.Decision: Add platform-specific adapter overrides in the gateway. Tools can declare which platforms they support via metadata. The gateway filters unavailable tools before the LLM sees them. Backward compatible — tools without platform metadata are available on all platforms.Trade-off: Adds a secondary lookup map and increases tool metadata complexity. But keeps the orchestration layer platform-agnostic while enabling platform-specific optimizations.

ADR-017: sherpa-onnx In-Process Speech Engine

Problem: agtOS depended on speaches, a Python-based STT/TTS server with a single maintainer who disappears for months. Critical bugs (transcription hallucination, silence crash) sat unmerged for 39-67+ days. The v0.9.0 release was stuck in RC for 6+ months. Every STT/TTS call required a cross-process HTTP round-trip to the Python sidecar.Decision: Replace speaches with sherpa-onnx-node (v1.12.34), a pre-compiled N-API native addon that runs STT, TTS, and VAD directly in the Node.js process via ONNX Runtime. This brings 17+ STT models (Whisper, Moonshine, SenseVoice, Zipformer, Paraformer), 7 TTS families (Kokoro, Piper, Matcha, ZipVoice), and Silero VAD into a single process. GPU acceleration is available via CUDA (Linux), CoreML (macOS), and DirectML (Windows). speaches is preserved as a config-switchable fallback.Trade-off: Whisper uses greedy decoding only (no beam search), model files require ~460MB local storage, and TypeScript type declarations (~300 lines) must be maintained manually. But the elimination of the Python sidecar, HTTP overhead, and single-maintainer dependency risk far outweighs these costs.

ADR-018: Cognitive Task Provider Architecture

Problem: agtOS has multiple specialized AI tasks (embedding, classification, reasoning, consolidation, summarization) that each have different cost/latency/quality trade-offs. Using a single provider for all tasks was inefficient.Decision: Allow independent provider selection for each cognitive task. Each task can route to its optimal provider (ollama for local inference, claude for reasoning, openrouter for fallback). This is configured via environment variables and can be changed at runtime via PUT /api/settings.Trade-off: More configuration complexity and multiple API keys to manage. But enables fine-grained optimization — e.g., local Ollama for embeddings (cost-free) while routing complex reasoning to Claude.

ADR-020: Model Slot Registry

Problem: The two-tier LOCAL/CLOUD routing model forced a single cloud provider for all requests. With OpenAI added as an alternative (ADR-019), there was no way to use different providers for different request types — e.g., Claude for reasoning and OpenAI for tool calling. The AGTOS_CLOUD_PROVIDER env var was a global switch that couldn’t express per-capability preferences.Decision: Replace the two-tier model with a Model Slot Registry. Each slot is a named capability position (chat, reasoning, coding, tool_calling, creative) that maps to a configured provider + model pair. The intent classifier routes requests to slot names, and the registry resolves slots to live provider instances. Task slots (embedding, classifier, summarization, consolidation, dialectic, maintenance) handle background cognitive tasks. Configuration lives in ~/.agtos/config.json under a slots key. Each slot supports fallback chains to handle provider failures.Trade-off: More configuration complexity — users must understand slots to customize routing. But the default agtos setup wizard configures the chat slot (required) and optional reasoning slot, which covers the common case. Power users get fine-grained control over which provider handles which kind of request.

ADR-021: Memory Enhancement — Resource-Aware Background Work, Maintenance Mode, Query-as-Ingest

Problem: The MemoryDreamer consolidation engine was implemented with 60+ tests but triggerConsolidation() was never called from anywhere in the codebase — user profiles were always empty. Even when wired, the Dreamer would compete with active voice sessions for local Ollama GPU/VRAM (Ollama queues requests when VRAM is full). And there was no mechanism to age out stale conclusions, detect contradictions, or prune redundant facts as the knowledge base grew.Decision: Five complementary changes that preserve the three-tier architecture (ADR-007):

Wire the Dreamer — call triggerConsolidation() in endVoiceSession() plus a server-level sessionEnded event listener as defense-in-depth.
ResourceGuard — gate every background LLM call via a deterministic decision tree: policy override → cloud/remote short-circuit → active sessions → session cooldown → system load → Ollama VRAM probe (GET /api/ps). Configurable policy (auto / always / idle-only). Windows caveat: os.loadavg() returns zeros, so VRAM probe is the only strong signal.
qmd MCP configuration + tool category inference — AGTOS_MCP_SERVERS JSON env var for external MCP servers. inferCategory() maps tool name/description to intent categories so search tools auto-participate in dynamic tool selection (ADR-009).
Query-as-Ingest — a new RESPONSE_INGEST episode type. Heuristic scoring (tool calls, multi-step reasoning, length, synthesis patterns) decides when to persist a high-quality agent response so synthesis compounds across sessions.
Maintenance Mode — Dreamer.maintain() runs a six-step sweep (stale detection, confidence decay, redundancy merge, orphan flagging, LLM contradiction check, low-confidence prune). Auto-registered cron task (default 0 3 * * *), POST /api/memory/maintain for on-demand runs, memory-maintenance health check flags runs older than 48 hours.

Trade-off: More moving parts in the MemoryCoordinator, and maintenance LLM calls cost tokens on cloud providers. Mitigated by per-sweep caps, the cron schedule, and ResourceGuard’s skip-on-load. The contradiction detection portion of Step 5 is superseded by ADR-027’s NLI hybrid pipeline when its dependencies are wired — ADR-021 still owns ResourceGuard, the cron schedule, the kill switch, dangling-source detection, and the maintenance health check.

ADR-025: Multi-Tenant-Ready by Discipline

Problem: agtOS ships single-user today (one credential file, one global API key, one Tauri sidecar per machine), but the long-term roadmap includes shared deployments for business teams and family households. Without discipline, adding real multi-user support later means hunting down every hardcoded 'default' literal and migrating Redis keys — painful and bug-prone. Industry postmortems (Kestra 2024, WorkOS guidance) converge on: “the data layer must be multi-tenant-aware from day one, even if the operational layer is not.”Decision: Adopt seven code-review rules enforced by discipline rather than infrastructure:

Never hardcode 'default' as a userId — use resolveUserId(config) from @/core/types.
Tenant-first Redis keys: agtos:{subsystem}:{userId}:{entity} for all user-scoped data.
No optional userId in client-facing API request schemas; the server derives it from profileManager.getDefaultUserId(). Schemas are .strict().
No premature auth middleware — the single global AGTOS_API_KEY is correct for single-user. The only acceptable infrastructure is type UserId = string and resolveUserId(config).
Document the contract in any new ADR that introduces persistence.
Every vector search must accept a userId parameter.
Background jobs carry userId at task creation time, not at execution time.

Trade-off: Requires code-review vigilance to enforce (no linter catches all violations). But the migration path when real multi-user ships is small: only resolveUserId() and the credentials store change. All other code stays identical because the data layer is already correct.

ADR-026: ProviderCatalog Interface + OpenRouter First-Class

Problem: The dashboard’s slot configuration UI, the agtos setup wizard, and the CLI all need to discover what models are available from each provider — but every provider SDK has a different shape. Anthropic exposes rich ModelInfo with capability flags; OpenAI exposes only id and owned_by; Ollama needs list + show fan-out; OpenRouter has its own /api/v1/models endpoint with string-encoded pricing. Meanwhile, OpenRouter was being treated as “OpenAI with a different baseURL”, which caused credential scope confusion, lost access to rich catalog metadata, and mis-layered ranking headers (HTTP-Referer, X-Title).Decision: Define a ProviderCatalog interface at src/core/providers/catalog/types.ts with listModels() (returns Result<ModelInfo[]>), optional getAccountInfo(), and optional validateModel(). ModelInfo carries a 13-entry finite capability union (including 'contradiction' for KB maintenance), context length, pricing per 1M tokens, and provider-specific fields (sizeBytes for Ollama, upstreamProvider for OpenRouter). Four implementations ship: OpenRouterCatalog, OllamaCatalog, ClaudeCatalog, OpenAICatalog, all with 1-hour TTL caching. OpenRouter is promoted to a first-class provider under src/providers/openrouter/ with its own credential scope (provider-openrouter), a client-provider (OpenAI SDK with baseURL override plus attribution headers), and a new maintenance task slot (6th entry in TASK_SLOTS) for the Stage 3 LLM judge in the NLI hybrid pipeline.Trade-off: Four catalog implementations to maintain plus a capability map for OpenAI (whose API doesn’t expose capabilities). But the dashboard, setup wizard, and NLI hybrid pipeline (ADR-027) all get a single provider-agnostic query path, and OpenRouter finally gets its own credential scope and pricing data.

ADR-027: Memory Maintenance V2 — NLI Hybrid Contradiction Pipeline

Problem: The Memory Maintenance V1 design (ADR-021) sent every conclusion in the profile to a single LLM call to audit for contradictions. At ~500 conclusions per profile, modern frontier models drop pairs from the middle of the list (“attention dilution”) and hallucinate conclusion IDs. Published benchmarks show ~71% recall for this flat-list approach vs. ~87.8% for ContraGen + cross-encoder hybrids. The single-stage path is also expensive — paying frontier rates to sift through pairs that a cheaper pre-filter could rule out — and there’s no principled way to upgrade only the contradiction step.Decision: Ship a 3-stage hybrid contradiction detection pipeline that replaces the single-LLM detector inside Dreamer.maintain() Step 5 when its dependencies are wired:

Stage 1 — Candidate selection: CandidateSelector computes an in-memory cosine over conclusion embeddings, caches by id + textHash, selects top-K nearest plus an “interesting pair” priority heuristic, deduplicates, and truncates to 500 pairs.
Stage 2 — NLI cross-encoder + pair cache: NliClassifier runs onnxruntime-node@^1.24.3 on a CPU session with a quantized DeBERTa-v3-base MNLI model (223 MB, SHA-256 pinned, atomic-rename download). PairCache uses Redis HEXPIRE on 7.4+ with STRING+EX fallback, content-addressed pair keys, and a secondary index for O(k) invalidation.
Stage 3 — Batched LLM judge: LlmJudge sends 10 pairs per call with a structured-JSON Zod-validated prompt. Per-batch isolation means a single failure logs a warning and contributes zero without aborting the sweep. Uses the hot-swappable maintenance task slot from ADR-026 so users can pin a cheaper / faster model for the judge.

Opt-out via AGTOS_NLI_ENABLED=false; legacy callers fall through to the V1 single-LLM path. A new memory.contradiction.detected event fires once per confirmation. MaintenanceReport.summary.contradictionPipeline carries per-stage counters and latencies.Trade-off: 223 MB model file to ship (prebuild via npm run prebuild:nli) and additional CPU work on sweep days. But recall goes up, token cost goes down, and each stage can be upgraded independently. Backwards compatibility is preserved: the existing 77 dreamer tests run unchanged against the V1 fall-through path.

ADR-028: Personal AI Gateway Thesis

Problem: agtOS evolved into a sophisticated orchestration system but lacked a unifying strategic thesis for feature prioritization, resulting in ad hoc decisions around voice hardening, memory improvements, and dashboard polish.Decision: Adopt the “Personal AI Gateway” framing as the core positioning — the identity + memory + routing layer between a person and every AI they touch. Prioritizes capture infrastructure, entity-centric memory, open protocols, and multi-surface support over single-surface optimization, guided by a 12-18 month market window.Trade-off: Scope expansion risk (capture, entities, protocols, multi-surface support each require significant effort) versus differentiation opportunity (no competitor occupies this exact position combining protocol-agnostic capture + NLI contradiction detection + local-first architecture).

ADR-029: PACT Capture Protocol

Problem: Existing protocols (MCP, MCAP, LSL, openEHR) cover portions of capture but none address the “first mile” from device sensors to structured memory with provenance, presence attestation, and consent as first-class requirements.Decision: Design PACT (Presence Attestation & Capture Telemetry) with 11 core concepts: presence signals (10 mechanisms with trust grades), per-modality consent envelopes (audio/video/neural), jurisdiction-aware validation (defaulting to strictest two-party consent), witness delegation, local-first-only v1 architecture, and extensible mechanism/modality registries.Trade-off: Local-first-only v1 limits cloud sync/multi-device reach versus ensuring legal defensibility.

ADR-030: Entity-Centric Memory

Problem: The three-tier memory system stores everything as free-form text, unable to answer structured queries like “what do I know about Alice?” or “who does Alice work with?” because there is no entity-level indexing or relationship graph.Decision: Implement EntityManager and RelationshipManager using Redis JSON + RediSearch, with automatic NER extraction via @huggingface/transformers (bert-base-NER), entity deduplication via alias matching and embedding similarity (0.85 threshold), and a wiki UX for browsing/editing entities and relationships.Trade-off: Storage overhead (~2 MB at 1000 entities) and NER accuracy ceiling (general-purpose model will miss domain-specific entities) versus structured entity queries and relationship traversal.

ADR-031: Speaker Intelligence

Problem: Without speaker identification, all voice input is attributed to “the user” and multi-person conversations become unusable for entity-aware memory; active-participant verification for PACT legal compliance requires proof the device owner’s voice is present.Decision: Wrap sherpa-onnx SpeakerEmbeddingExtractor with Redis JSON persistence, enrollment lifecycle (5-10s voice sample → embedding → re-enrollment via EMA), active-participant verification, and post-session diarization with per-speaker transcript sections.Trade-off: Enrollment friction and cold-start latency versus multi-speaker attribution in episodic memory and legal compliance for ambient capture.

ADR-032: Legal Compliance Framework

Problem: PACT handles audio, video, and future neural data with significant legal implications across multiple jurisdictions — federal one-party consent, 12 US two-party states, EU AI Act biometric classification, Colorado/California neurorights, and COPPA age-detection paradox.Decision: Adopt 7 principles: (1) default active-participant mode only, (2) mandatory consent metadata in every envelope, (3) per-modality consent (audio ≠ video ≠ neural), (4) jurisdiction-aware validation (default to two-party), (5) explicit no-age-detection rule, (6) tool-not-service architecture (local-only, user control), (7) neural data requires explicit opt-in.Trade-off: User configuration burden and feature limitations (no age detection, no cloud sync) versus strong legal defensibility and future-proofing for neural data.

ADR-033: Consumer Desktop Experience

Problem: agtOS shipped with a 7-step setup wizard assuming Docker, terminal access, and API key familiarity; competitors deliver “just works” experiences where users double-click and start chatting.Decision: Simplify to a 3-step flow (Mode Selection → Redis → Done), implement just-in-time downloads (voice models on first mic click), SSE progress for all downloads, hardware-aware model recommendations via SystemCapabilities detection, and slot auto-configuration for cloud/local/hybrid modes.Trade-off: Redis still requires Docker for now (SQLite adapter deferred) versus reducing setup time from 30+ minutes to ~2 minutes and eliminating terminal commands.

ADR-034: Three-Tier Health & Redis Hot-Connect

Problem: Health aggregator treated all failures equally (any unhealthy = degraded); 6 capture subsystems always unhealthy on fresh install; Redis reconnection required full server restart.Decision: Add priority field to health checks (critical: ollama/sherpa/provider; important: cloud-providers/redis/mcp; optional: capture/memory/nli), return 200 only when critical services are healthy. Add POST /api/system/reconnect-redis to lazily create and connect Redis-dependent services on demand without restart.Trade-off: Fresh install shows green status even with optional capture unavailable versus slightly more complex ServerContext with mutable state indirection.

ADR-035: Runtime Billing Exhaustion

Problem: When cloud provider credits run out, users see cryptic errors indistinguishable from transient failures; each provider signals exhaustion differently (Anthropic billing_error, OpenAI insufficient_quota, OpenRouter 402).Decision: Create provider-specific mapApiError() to classify billing errors into PROVIDER_BILLING_EXHAUSTED, centralize detection in BillingDetector that marks all slots for that provider unhealthy simultaneously, and enable user-configured fallback strategy (cloud-backup/ollama-local/none).Trade-off: OpenAI SDK wastes 2 retries before detection versus proactive balance monitoring and cross-provider fallback chains.

ADR-036: Desktop Chat UI

Problem: Desktop app had no text chat interface; users without microphones or in noisy environments could only interact via CLI.Decision: Create POST /api/chat/stream SSE endpoint with fetch + ReadableStream, accumulate token deltas in rAF buffer before flushing to React state, and defer markdown AST construction until streaming completes. Includes thinking/reasoning block display and tool call visualization.Trade-off: SSE via POST requires manual stream parsing (no native EventSource support for POST) versus real-time streaming feedback in the dashboard.

ADR-037: Provider-Agnostic Thinking & Vision

Problem: Desktop chat lacked thinking/reasoning display and image/vision input; Claude, OpenAI (o-series), Ollama (DeepSeek-R1, Qwen3), and OpenRouter all support thinking in different formats.Decision: Add onThinking callback to AgentStreamCallbacks with per-provider extraction (Claude stream.on, Ollama think field, OpenRouter message.reasoning, OpenAI Responses API reasoning_summary_text). Preserve continuity tokens in Message type. Base64-encode images in request body with per-provider formatting.Trade-off: Per-provider extraction logic for incompatible thinking formats versus unified user experience with thinking visibility and multi-turn reasoning continuity.

Proposing a New ADR

Create a file

Copy the template into docs/adr/NNN-short-description.md using the next sequential number.

Fill all sections

Context, Decision, and Consequences (positive, negative, risks). Include version numbers, benchmarks, and references.

Set status to Proposed

Until reviewed and accepted.

Add to the index

Update the table in docs/adr/README.md.

Link to a GitHub issue

Every ADR should relate to a tracked issue.

ADR Template

# ADR-NNN: Title

**Date**: YYYY-MM-DD
**Status**: Proposed | Accepted | Deprecated | Superseded by ADR-NNN
**Relates to**: [GitHub Issue #NN] | [ADR-NNN]

## Context
What technical context motivates this decision?
Include version numbers, benchmarks, ecosystem changes.

## Decision
What are we choosing, and what are we explicitly not choosing?

## Consequences

### Positive
- What becomes easier or better?

### Negative
- What becomes harder or worse?

### Risks
- What could go wrong? What assumptions might not hold?

Status Lifecycle

Status	Meaning
Proposed	Under discussion, not yet adopted
Accepted	Active decision, reflects current architecture
Deprecated	No longer relevant (technology abandoned, feature removed)
Superseded by ADR-NNN	Replaced by a newer decision

Full ADR text is available in the GitHub repository. Each ADR is self-contained — read only the ones relevant to your work.

​Why ADRs?

​ADR Index

​Key Architectural Principles

​Protocol-First Design

​Local-First Where Possible

​Infrastructure/Orchestration Separation

​Backward-Compatible Extension

​Security by Default

​ADR Deep Dives

​Proposing a New ADR

​ADR Template

​Status Lifecycle

Why ADRs?

ADR Index

Key Architectural Principles

Protocol-First Design

Local-First Where Possible

Infrastructure/Orchestration Separation

Backward-Compatible Extension

Security by Default

ADR Deep Dives

Proposing a New ADR

ADR Template

Status Lifecycle