Changelog - agtOS

Unreleased

Added

Memory Maintenance (Memory Lint) — Dreamer.maintain() runs a periodic knowledge-base health sweep: stale conclusion confidence decay, Jaccard redundancy merge, orphan episode flagging, contradiction detection, and low-confidence pruning. Auto-registered cron task (default 0 3 * * * in AGTOS_MAINTENANCE_TIMEZONE, default UTC). On-demand via POST /api/memory/maintain or agtos memory maintain. memory-maintenance health check flags runs older than 48 hours. Killable with AGTOS_MAINTENANCE_ENABLED=false. (ADR-021)
NLI hybrid contradiction pipeline — Mandatory 3-stage pipeline for contradiction detection in Dreamer.maintain(). Stage 1 selects candidate pairs via cosine similarity over conclusion embeddings. Stage 2 runs a quantized DeBERTa-v3-base MNLI cross-encoder via onnxruntime-node with a Redis PairCache. Stage 3 sends the survivors to a batched LLM judge via the maintenance task slot. When AGTOS_NLI_ENABLED=false, Stage 2 is skipped and Stage 3 receives the full candidate list from Stage 1 (the legacy single-LLM detector has been removed). Prebuild with npm run prebuild:nli. New memory.contradiction.detected event. (ADR-027)
ResourceGuard — Gates every background LLM call (consolidation and maintenance) through a deterministic decision tree: policy override → cloud/remote short-circuit → active sessions → session cooldown → system load → Ollama VRAM probe (GET /api/ps). Configurable policy via AGTOS_BACKGROUND_WORK_POLICY (auto / always / idle-only). Retry-with-backoff for consolidation, skip-and-wait for maintenance. agtos_background_work_safe gauge and agtos_resource_guard_defer_count_total{reason} counter. (ADR-021)
Dreamer activation — triggerConsolidation() is now wired into endVoiceSession() with a server-level sessionEnded event listener as defense-in-depth. The 60+ test consolidation engine moves from dead code to production runtime — user profiles finally populate after sessions end.
Query-as-Ingest — New RESPONSE_INGEST episode type persists high-quality agent responses (heuristic scoring on tool calls, multi-step reasoning, length, and synthesis patterns) so synthesis compounds across sessions. Per-session rate limit and 5-second dedup window.
agtos memory maintain CLI command — Triggers an on-demand memory lint sweep. Exit codes: 0 (ok), 1 (request failed), 2 (RESOURCES_BUSY — retry later), 3 (PROFILE_DISCONNECTED — operator action). Supports --user <id> and --verbose.
maintenance task slot — 6th entry in TASK_SLOTS for the Stage 3 LLM judge in the NLI hybrid pipeline. Defaults to fallback: 'consolidation' so existing single-provider setups keep working unchanged. Dreamer resolves the maintenance provider at call time via resolveMaintenanceProvider(), so hot-swapping the slot takes effect on the next sweep without a restart.
ProviderCatalog interface — Cross-provider model discovery via listModels() / getAccountInfo() / validateModel(). Four implementations ship: OpenRouterCatalog, OllamaCatalog, ClaudeCatalog, OpenAICatalog. ModelInfo carries context length, max output tokens, per-1M-token pricing, and a 13-entry capability union including 'contradiction'. One-hour TTL caching. (ADR-026)
OpenRouter first-class provider — Promoted from “OpenAI with a different baseURL” to a full provider under src/providers/openrouter/ with its own credential scope (provider-openrouter), attribution headers (HTTP-Referer, X-Title), and rich /api/v1/models catalog with string-encoded per-token pricing. Can now be configured for any slot (conversation or task). (ADR-026)
Provider lifecycle events — Four canonical topics at src/core/providers/events.ts: provider.initialized, provider.failed, provider.catalog.refreshed (fires only on successful network fetch, not cache hits), and provider.credentials.updated (fires on create/rotate/delete in CredentialManager).
Memory maintenance history API — GET /api/memory/maintain/history lists recent reports (30-day TTL, 200-entry sorted-set index) and GET /api/memory/maintain/history/:timestamp fetches one by timestamp. Powers the dashboard Memory Browser’s maintenance widget.
Timezone-aware scheduler — CronSchedule.timezone (IANA) with croner@^10.0.1 (zero runtime dependencies) for next-run computation. Replaces process-local time semantics. AGTOS_MAINTENANCE_TIMEZONE exposes the timezone to operators. (ADR-023)
Atomic profile updates — UserProfileManager uses a node-redis v5 connection pool with WATCH/MULTI/EXEC optimistic locking via withOptimisticLock<T>(). All five mutating methods retry up to 3 times on WatchError, closing the audit M3 race window where concurrent maintenance + consolidation could lose conclusions. (ADR-024)
Multi-tenant-ready data layer — Every Redis key includes {userId} in tenant-first position, every vector search accepts a userId filter, and every business-logic call site goes through resolveUserId(). No operational multi-user today, but a clean migration path. (ADR-025)
qmd MCP integration — AGTOS_MCP_SERVERS JSON env var for external MCP servers. McpClientManager.discoverTools() now runs inferCategory() on each discovered tool so search tools auto-participate in intent-based tool selection (ADR-009).
memory-semantic health check — Probes the RediSearch vector index document count and size via FT.INFO with a 1-second timeout. Non-throwing on Redis failures.
Memory lifecycle events — memory.conclusion.restored, memory.consolidation.deferred, and memory.maintenance.failed complete the symmetry with the existing decay / prune / completed family so dashboards can wire alerts without parsing logs.
New Prometheus metrics — agtos_background_work_safe gauge, agtos_memory_consolidation_deferred_total{reason}, agtos_resource_guard_defer_count_total{reason}, agtos_pair_cache_lookups_total{result}, agtos_contradiction_pipeline_stage_duration_seconds{stage}, agtos_nli_inferences_total{result} (verdicts: contradiction/neutral/entailment), agtos_nli_inference_duration_seconds (latency summary with p50/p95/p99), agtos_provider_catalog_fetch_total{provider,status} (catalog refresh attempts per provider), and agtos_provider_catalog_models_count{provider} (current model count gauge per provider).
Model Slot Registry — named capability slots (chat, reasoning, coding, tool_calling, creative) with per-slot provider+model config and fallback chains. Replaces the global cloud provider env var. Configured in ~/.agtos/config.json. (ADR-020)
Encrypted credential storage — API keys encrypted with AES-256-GCM at ~/.agtos/credentials.json. scrypt key derivation (N=16384), AAD-bound ciphertext per provider, auto-generated machine secret at ~/.agtos/.secret. 149 credential-specific tests.
OpenAI cloud provider — GPT-4o and GPT-4o Mini as drop-in alternatives to Claude. Full streaming, tool calling, session management, and health checks. Configurable per slot. (ADR-019)
CLI API key validation — agtos setup validates API keys against the actual provider API before saving
First-run detection — agtos start guides you to agtos setup when no configuration exists
Structured startup progress — agtos start shows step-by-step service initialization with status icons
Doctor credential validation — agtos doctor checks credential file health, permissions, API key functionality, network connectivity, and reports feature degradation
Setup token auth — 30-min TTL token for credential storage during onboarding (X-Setup-Token header)
Onboarding mic test — real-time audio level visualization during desktop app setup
Settings credential management — update API keys inline with validation from the Settings page
Credential health check — per-provider source tracking exposed via /health endpoint with Prometheus metrics
ProviderLifecycleManager — Runtime provider hot-swap without server restart. Credential rotation triggers provider.credentials.updated event, and the lifecycle manager atomically swaps the client provider instance. In-flight requests complete on the old client; new requests use the new credentials. Per-provider health checks (provider-claude, provider-openai, provider-ollama, provider-openrouter) report credential status, catalog freshness, and staleness.
PUT /api/slots model validation — PUT /api/slots now validates each slot’s model against the ProviderCatalog. Unknown models produce a warning but are allowed (private/unlisted models). Models with past deprecation dates are blocked with HTTP 400. Models with future deprecation dates produce a warning. Catalog fetch failures are non-blocking. Response includes a warnings array when applicable.
6 Memory V2 / Provider Catalog config keys formalized — nliModelSize (AGTOS_NLI_MODEL_SIZE), nliContradictionThreshold (AGTOS_NLI_CONTRADICTION_THRESHOLD), pairCacheTtlSeconds (AGTOS_PAIR_CACHE_TTL_SECONDS), providerCatalogCacheTtlSeconds (AGTOS_PROVIDER_CATALOG_CACHE_TTL_SECONDS), and providerCatalogAutoRefresh (AGTOS_PROVIDER_CATALOG_AUTO_REFRESH) are now registered in CONFIG_KEY_META with typed schemas, validation ranges, and reload-type annotations. All are discoverable via the dashboard Settings page and GET /api/settings.
Tauri ORT sidecar bundling — Desktop builds bundle ONNX Runtime binaries via scripts/copy-ort-binaries.mjs. The Rust sidecar sets AGTOS_ORT_RUNTIME_DIR so the NLI pipeline resolves onnxruntime-node bindings from the packaged app. NLI works out-of-box in desktop builds when AGTOS_NLI_ENABLED=true.
OpenRouter embedding provider — First-class OpenRouterEmbeddingProvider with its own credential scope (provider-openrouter), deferred credential failure (constructor doesn’t throw), and request-level retry with exponential backoff (3 attempts, 250ms initial, 2× multiplier). Retries on HTTP 429 and 5xx; fails fast on 400/401/403/404.

Changed

NLI contradiction pipeline mandatory — ContradictionPipelineDeps is now a required constructor argument on MemoryDreamer. The legacy single-LLM detectContradictionsLlm() fall-through and its buildMaintenancePrompt helper have been deleted. When AGTOS_NLI_ENABLED=false, Stage 2 (cross-encoder) is skipped and Stage 3 (LLM judge) receives the full candidate list from Stage 1. MaintenanceReport.summary.contradictionPipeline is now a required field (short-circuit paths produce zero-valued metrics).
triggerMaintenance() return shape — now returns a discriminated MaintenanceTriggerResult ({ok:true, report} or {ok:false, errorCode: 'PROFILE_DISCONNECTED' | 'RESOURCES_BUSY', message, reason?}) instead of MaintenanceReport | null. POST /api/memory/maintain 503 responses now carry errorCode so clients can distinguish transient (retry) from persistent (operator action) failures without parsing message strings.
Zod 4 upgrade — [email protected] is the baseline. Removed all as never casts and @ts-ignore TS2589 suppressions from MCP tool registrations. z.record() now requires both a key and a value schema (e.g., z.record(z.string(), z.unknown())). No user-facing behavior change. (ADR-022)
Scheduler library swap — replaced node-cron@4 + the custom cron-parser implementation with croner@^10.0.1 (zero runtime dependencies, native IANA timezone and DST handling). Cron expressions that previously relied on the process-local timezone now fire in UTC by default; set AGTOS_MAINTENANCE_TIMEZONE to opt in to a different IANA zone. (ADR-023)
Consolidation / maintenance providers — now resolve via the consolidation and maintenance task slots in ~/.agtos/config.json. Legacy AGTOS_CONSOLIDATION_PROVIDER / AGTOS_CONSOLIDATION_MODEL env vars still work as fallbacks.
Model router v2.0 — replaced two-tier LOCAL/CLOUD dispatch with slot-based routing via Model Slot Registry. AGTOS_CLOUD_PROVIDER and AGTOS_CHAT_PROVIDER env vars removed in favor of per-slot config.
Auth overhaul — removed OAuth support. Authentication is now API key only for all providers. ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN env vars are no longer recognized.
Setup wizard — now writes encrypted credentials to ~/.agtos/credentials.json and slot configuration to ~/.agtos/config.json. Migrates plaintext keys from .env.local.
Key derivation — migrated from PBKDF2-SHA256 to scrypt (N=16384, r=8, p=1). Auto-migrates existing PBKDF2 credential files on first access.
CLI transport restructured — opt-in via AGTOS_CLAUDE_TRANSPORT=cli. Passes ANTHROPIC_API_KEY to the subprocess instead of OAuth tokens.
Default speech engine — changed from speaches to sherpa-onnx (backward-compatible env var fallback)
/api/credentials — removed from auth-exempt paths. Now requires API key or setup token.
Desktop Chat UI — SSE streaming text chat (POST /api/chat/stream) with rAF token batching, deferred markdown rendering via react-markdown + remark-gfm, and tool call visualization inline. (ADR-036)
Provider-agnostic thinking/reasoning — Unified thinking/reasoning output across Claude (extended thinking), OpenAI (reasoning summary via Responses API), Ollama (Qwen3/DeepSeek-R1/Gemma 4 think tags), and OpenRouter (message.reasoning). Multi-turn reasoning continuity preserved via stored thinking tokens. (ADR-037)
Image/vision support in chat — Paste or drag-and-drop images into the chat. Per-provider formatting (Claude image blocks, OpenAI image_url, Ollama images array). Supported on all providers with vision capability. (ADR-037)
Syntax highlighting — Code blocks in chat responses rendered with react-shiki for accurate language-specific highlighting.
Conversation history persistence — GET /api/chat/history/:sessionId retrieves past messages. Dashboard Conversations page provides a browser with session resume capability.
Unified voice + chat session — App-level activeSessionId shared between voice and chat interfaces. Voice and text interactions contribute to the same session context.
OpenAI Responses API migration — OpenAI provider migrated from Chat Completions to Responses API with reasoning support (o-series models), EasyInputMessage format, and replay of reasoning items for multi-turn continuity. (ADR-037)
Ollama thinking support — Thinking/reasoning output from Qwen3, DeepSeek-R1, and Gemma 4 models streamed via the think field, with automatic strip for non-thinking-capable models.
Entity-centric memory (Knowledge Wiki) — Redis JSON property graph with NER-extracted entities, relationships, alias deduplication, and 9 API endpoints for CRUD, merge, and graph operations. Dashboard Knowledge page for browsing and editing. (ADR-030)
PACT Capture Protocol — Multimodal capture protocol with presence signals, per-modality consent envelopes, jurisdiction-aware validation, and 5 API endpoints. Local-first-only v1. (ADR-029)
Speaker Intelligence — sherpa-onnx speaker embedding extraction, diarization, and Redis persistence for multi-speaker attribution. (ADR-031)
Billing-aware model router — Runtime detection of billing exhaustion (Anthropic billing_error, OpenAI insufficient_quota, OpenRouter 402) with auto-fallback via BillingDetector and user-configured fallback strategies (cloud-backup/ollama-local/none). Billing dashboard UX, GET /api/billing/status, POST /api/billing/retry/:providerId. (ADR-035)
Three-tier health system — Health checks prioritized as critical/important/optional. Fresh installs show green when critical services are healthy. POST /api/system/reconnect-redis for hot-connecting Redis services without restart. (ADR-034)
Consumer onboarding — 3-step setup wizard (Mode → Redis → Done) replacing the 7-step developer flow. Just-in-time model downloads, hardware-aware recommendations, slot auto-configuration. (ADR-033)
App Management — Settings section with re-run setup (POST /api/system/reset-onboarding) and reset slots to defaults (POST /api/slots/reset).
Chat keyboard shortcuts + accessibility — Enter to send, Shift+Enter for newline, Escape to cancel streaming.
Fallback strategy configuration — POST /api/slots/auto-configure accepts a fallbackStrategy parameter for cloud-backup, ollama-local, or none.
Billing exhaustion Prometheus counter — agtos_provider_billing_exhaustion_total{provider} for monitoring billing events across providers.
Auto-connect to running Redis — Server probes for an existing Redis instance at boot before attempting to start a managed one.

Changed

Onboarding wizard — Simplified from 7 steps to 3 steps (Mode Selection → Redis → Done). See ADR-033.
OpenAI provider — Migrated from Chat Completions API to Responses API for reasoning support and multi-turn continuity.

Fixed

Onboarding wizard not shown in Tauri desktop app on first launch
Uptime display showing incorrect values (was using stale module-level timestamp)
503 error during first few minutes after launch (/api/health now bypasses initialization gate)
OpenAI tool calls silently dropped during streaming
OpenAI stream cancellation for voice barge-in
Router hardcoded provider:'claude' for all cloud decisions — now uses per-slot config
Various OAuth UI remnants removed from Settings page
Claude multi-turn thinking — stores full content blocks, rebuilds in buildMessages
OpenAI multi-turn reasoning — EasyInputMessage, replay reasoning items, handle incomplete
Ollama thinking capability check for non-thinking models
Vision images sent on all agent steps, not just first
History fetch merge for conversation browser
Voice session propagation with unified activeSessionId
Auth guards and input validation on history/session endpoints
Billing tool errors with fallback status display
Virtualized chat messages for long conversations
Concurrent send guard preventing duplicate messages

Removed

AGTOS_CLOUD_PROVIDER and AGTOS_CHAT_PROVIDER env vars — replaced by per-slot provider config in Model Slot Registry
forceCloudPatterns and forceLocalPatterns routing config — replaced by forceSlotPatterns
Two-tier LOCAL/CLOUD routing terminology

​Unreleased

​Added

​Changed

​Changed

​Fixed

​Removed

Unreleased

Added

Changed

Changed

Fixed

Removed