Documentation Index
Fetch the complete documentation index at: https://docs.agtos.ai/llms.txt
Use this file to discover all available pages before exploring further.
Unreleased
Added
- Memory Maintenance (Memory Lint) —
Dreamer.maintain()runs a periodic knowledge-base health sweep: stale conclusion confidence decay, Jaccard redundancy merge, orphan episode flagging, contradiction detection, and low-confidence pruning. Auto-registered cron task (default0 3 * * *inAGTOS_MAINTENANCE_TIMEZONE, defaultUTC). On-demand viaPOST /api/memory/maintainoragtos memory maintain.memory-maintenancehealth check flags runs older than 48 hours. Killable withAGTOS_MAINTENANCE_ENABLED=false. (ADR-021) - NLI hybrid contradiction pipeline — Mandatory 3-stage pipeline for contradiction detection in
Dreamer.maintain(). Stage 1 selects candidate pairs via cosine similarity over conclusion embeddings. Stage 2 runs a quantized DeBERTa-v3-base MNLI cross-encoder viaonnxruntime-nodewith a RedisPairCache. Stage 3 sends the survivors to a batched LLM judge via themaintenancetask slot. WhenAGTOS_NLI_ENABLED=false, Stage 2 is skipped and Stage 3 receives the full candidate list from Stage 1 (the legacy single-LLM detector has been removed). Prebuild withnpm run prebuild:nli. Newmemory.contradiction.detectedevent. (ADR-027) - ResourceGuard — Gates every background LLM call (consolidation and maintenance) through a deterministic decision tree: policy override → cloud/remote short-circuit → active sessions → session cooldown → system load → Ollama VRAM probe (
GET /api/ps). Configurable policy viaAGTOS_BACKGROUND_WORK_POLICY(auto/always/idle-only). Retry-with-backoff for consolidation, skip-and-wait for maintenance.agtos_background_work_safegauge andagtos_resource_guard_defer_count_total{reason}counter. (ADR-021) - Dreamer activation —
triggerConsolidation()is now wired intoendVoiceSession()with a server-levelsessionEndedevent listener as defense-in-depth. The 60+ test consolidation engine moves from dead code to production runtime — user profiles finally populate after sessions end. - Query-as-Ingest — New
RESPONSE_INGESTepisode type persists high-quality agent responses (heuristic scoring on tool calls, multi-step reasoning, length, and synthesis patterns) so synthesis compounds across sessions. Per-session rate limit and 5-second dedup window. agtos memory maintainCLI command — Triggers an on-demand memory lint sweep. Exit codes:0(ok),1(request failed),2(RESOURCES_BUSY— retry later),3(PROFILE_DISCONNECTED— operator action). Supports--user <id>and--verbose.maintenancetask slot — 6th entry inTASK_SLOTSfor the Stage 3 LLM judge in the NLI hybrid pipeline. Defaults tofallback: 'consolidation'so existing single-provider setups keep working unchanged. Dreamer resolves the maintenance provider at call time viaresolveMaintenanceProvider(), so hot-swapping the slot takes effect on the next sweep without a restart.- ProviderCatalog interface — Cross-provider model discovery via
listModels()/getAccountInfo()/validateModel(). Four implementations ship:OpenRouterCatalog,OllamaCatalog,ClaudeCatalog,OpenAICatalog.ModelInfocarries context length, max output tokens, per-1M-token pricing, and a 13-entry capability union including'contradiction'. One-hour TTL caching. (ADR-026) - OpenRouter first-class provider — Promoted from “OpenAI with a different baseURL” to a full provider under
src/providers/openrouter/with its own credential scope (provider-openrouter), attribution headers (HTTP-Referer,X-Title), and rich/api/v1/modelscatalog with string-encoded per-token pricing. Can now be configured for any slot (conversation or task). (ADR-026) - Provider lifecycle events — Four canonical topics at
src/core/providers/events.ts:provider.initialized,provider.failed,provider.catalog.refreshed(fires only on successful network fetch, not cache hits), andprovider.credentials.updated(fires on create/rotate/delete inCredentialManager). - Memory maintenance history API —
GET /api/memory/maintain/historylists recent reports (30-day TTL, 200-entry sorted-set index) andGET /api/memory/maintain/history/:timestampfetches one by timestamp. Powers the dashboard Memory Browser’s maintenance widget. - Timezone-aware scheduler —
CronSchedule.timezone(IANA) withcroner@^10.0.1(zero runtime dependencies) for next-run computation. Replaces process-local time semantics.AGTOS_MAINTENANCE_TIMEZONEexposes the timezone to operators. (ADR-023) - Atomic profile updates —
UserProfileManageruses a node-redis v5 connection pool withWATCH/MULTI/EXECoptimistic locking viawithOptimisticLock<T>(). All five mutating methods retry up to 3 times onWatchError, closing the audit M3 race window where concurrent maintenance + consolidation could lose conclusions. (ADR-024) - Multi-tenant-ready data layer — Every Redis key includes
{userId}in tenant-first position, every vector search accepts auserIdfilter, and every business-logic call site goes throughresolveUserId(). No operational multi-user today, but a clean migration path. (ADR-025) - qmd MCP integration —
AGTOS_MCP_SERVERSJSON env var for external MCP servers.McpClientManager.discoverTools()now runsinferCategory()on each discovered tool so search tools auto-participate in intent-based tool selection (ADR-009). memory-semantichealth check — Probes the RediSearch vector index document count and size viaFT.INFOwith a 1-second timeout. Non-throwing on Redis failures.- Memory lifecycle events —
memory.conclusion.restored,memory.consolidation.deferred, andmemory.maintenance.failedcomplete the symmetry with the existing decay / prune / completed family so dashboards can wire alerts without parsing logs. - New Prometheus metrics —
agtos_background_work_safegauge,agtos_memory_consolidation_deferred_total{reason},agtos_resource_guard_defer_count_total{reason},agtos_pair_cache_lookups_total{result},agtos_contradiction_pipeline_stage_duration_seconds{stage},agtos_nli_inferences_total{result}(verdicts: contradiction/neutral/entailment),agtos_nli_inference_duration_seconds(latency summary with p50/p95/p99),agtos_provider_catalog_fetch_total{provider,status}(catalog refresh attempts per provider), andagtos_provider_catalog_models_count{provider}(current model count gauge per provider). - Model Slot Registry — named capability slots (
chat,reasoning,coding,tool_calling,creative) with per-slot provider+model config and fallback chains. Replaces the global cloud provider env var. Configured in~/.agtos/config.json. (ADR-020) - Encrypted credential storage — API keys encrypted with AES-256-GCM at
~/.agtos/credentials.json. scrypt key derivation (N=16384), AAD-bound ciphertext per provider, auto-generated machine secret at~/.agtos/.secret. 149 credential-specific tests. - OpenAI cloud provider — GPT-4o and GPT-4o Mini as drop-in alternatives to Claude. Full streaming, tool calling, session management, and health checks. Configurable per slot. (ADR-019)
- CLI API key validation —
agtos setupvalidates API keys against the actual provider API before saving - First-run detection —
agtos startguides you toagtos setupwhen no configuration exists - Structured startup progress —
agtos startshows step-by-step service initialization with status icons - Doctor credential validation —
agtos doctorchecks credential file health, permissions, API key functionality, network connectivity, and reports feature degradation - Setup token auth — 30-min TTL token for credential storage during onboarding (
X-Setup-Tokenheader) - Onboarding mic test — real-time audio level visualization during desktop app setup
- Settings credential management — update API keys inline with validation from the Settings page
- Credential health check — per-provider source tracking exposed via
/healthendpoint with Prometheus metrics - ProviderLifecycleManager — Runtime provider hot-swap without server restart. Credential rotation triggers
provider.credentials.updatedevent, and the lifecycle manager atomically swaps the client provider instance. In-flight requests complete on the old client; new requests use the new credentials. Per-provider health checks (provider-claude,provider-openai,provider-ollama,provider-openrouter) report credential status, catalog freshness, and staleness. - PUT /api/slots model validation —
PUT /api/slotsnow validates each slot’s model against theProviderCatalog. Unknown models produce a warning but are allowed (private/unlisted models). Models with past deprecation dates are blocked with HTTP 400. Models with future deprecation dates produce a warning. Catalog fetch failures are non-blocking. Response includes awarningsarray when applicable. - 6 Memory V2 / Provider Catalog config keys formalized —
nliModelSize(AGTOS_NLI_MODEL_SIZE),nliContradictionThreshold(AGTOS_NLI_CONTRADICTION_THRESHOLD),pairCacheTtlSeconds(AGTOS_PAIR_CACHE_TTL_SECONDS),providerCatalogCacheTtlSeconds(AGTOS_PROVIDER_CATALOG_CACHE_TTL_SECONDS), andproviderCatalogAutoRefresh(AGTOS_PROVIDER_CATALOG_AUTO_REFRESH) are now registered inCONFIG_KEY_METAwith typed schemas, validation ranges, and reload-type annotations. All are discoverable via the dashboard Settings page andGET /api/settings. - Tauri ORT sidecar bundling — Desktop builds bundle ONNX Runtime binaries via
scripts/copy-ort-binaries.mjs. The Rust sidecar setsAGTOS_ORT_RUNTIME_DIRso the NLI pipeline resolves onnxruntime-node bindings from the packaged app. NLI works out-of-box in desktop builds whenAGTOS_NLI_ENABLED=true. - OpenRouter embedding provider — First-class
OpenRouterEmbeddingProviderwith its own credential scope (provider-openrouter), deferred credential failure (constructor doesn’t throw), and request-level retry with exponential backoff (3 attempts, 250ms initial, 2× multiplier). Retries on HTTP 429 and 5xx; fails fast on 400/401/403/404.
Changed
-
NLI contradiction pipeline mandatory —
ContradictionPipelineDepsis now a required constructor argument onMemoryDreamer. The legacy single-LLMdetectContradictionsLlm()fall-through and itsbuildMaintenancePrompthelper have been deleted. WhenAGTOS_NLI_ENABLED=false, Stage 2 (cross-encoder) is skipped and Stage 3 (LLM judge) receives the full candidate list from Stage 1.MaintenanceReport.summary.contradictionPipelineis now a required field (short-circuit paths produce zero-valued metrics). -
triggerMaintenance()return shape — now returns a discriminatedMaintenanceTriggerResult({ok:true, report}or{ok:false, errorCode: 'PROFILE_DISCONNECTED' | 'RESOURCES_BUSY', message, reason?}) instead ofMaintenanceReport | null.POST /api/memory/maintain503 responses now carryerrorCodeso clients can distinguish transient (retry) from persistent (operator action) failures without parsing message strings. -
Zod 4 upgrade —
[email protected]is the baseline. Removed allas nevercasts and@ts-ignore TS2589suppressions from MCP tool registrations.z.record()now requires both a key and a value schema (e.g.,z.record(z.string(), z.unknown())). No user-facing behavior change. (ADR-022) -
Scheduler library swap — replaced
node-cron@4+ the customcron-parserimplementation withcroner@^10.0.1(zero runtime dependencies, native IANA timezone and DST handling). Cron expressions that previously relied on the process-local timezone now fire inUTCby default; setAGTOS_MAINTENANCE_TIMEZONEto opt in to a different IANA zone. (ADR-023) -
Consolidation / maintenance providers — now resolve via the
consolidationandmaintenancetask slots in~/.agtos/config.json. LegacyAGTOS_CONSOLIDATION_PROVIDER/AGTOS_CONSOLIDATION_MODELenv vars still work as fallbacks. -
Model router v2.0 — replaced two-tier LOCAL/CLOUD dispatch with slot-based routing via Model Slot Registry.
AGTOS_CLOUD_PROVIDERandAGTOS_CHAT_PROVIDERenv vars removed in favor of per-slot config. -
Auth overhaul — removed OAuth support. Authentication is now API key only for all providers.
ANTHROPIC_AUTH_TOKENandCLAUDE_CODE_OAUTH_TOKENenv vars are no longer recognized. -
Setup wizard — now writes encrypted credentials to
~/.agtos/credentials.jsonand slot configuration to~/.agtos/config.json. Migrates plaintext keys from.env.local. - Key derivation — migrated from PBKDF2-SHA256 to scrypt (N=16384, r=8, p=1). Auto-migrates existing PBKDF2 credential files on first access.
-
CLI transport restructured — opt-in via
AGTOS_CLAUDE_TRANSPORT=cli. PassesANTHROPIC_API_KEYto the subprocess instead of OAuth tokens. - Default speech engine — changed from speaches to sherpa-onnx (backward-compatible env var fallback)
-
/api/credentials— removed from auth-exempt paths. Now requires API key or setup token. -
Desktop Chat UI — SSE streaming text chat (
POST /api/chat/stream) with rAF token batching, deferred markdown rendering via react-markdown + remark-gfm, and tool call visualization inline. (ADR-036) - Provider-agnostic thinking/reasoning — Unified thinking/reasoning output across Claude (extended thinking), OpenAI (reasoning summary via Responses API), Ollama (Qwen3/DeepSeek-R1/Gemma 4 think tags), and OpenRouter (message.reasoning). Multi-turn reasoning continuity preserved via stored thinking tokens. (ADR-037)
- Image/vision support in chat — Paste or drag-and-drop images into the chat. Per-provider formatting (Claude image blocks, OpenAI image_url, Ollama images array). Supported on all providers with vision capability. (ADR-037)
- Syntax highlighting — Code blocks in chat responses rendered with react-shiki for accurate language-specific highlighting.
-
Conversation history persistence —
GET /api/chat/history/:sessionIdretrieves past messages. Dashboard Conversations page provides a browser with session resume capability. -
Unified voice + chat session — App-level
activeSessionIdshared between voice and chat interfaces. Voice and text interactions contribute to the same session context. -
OpenAI Responses API migration — OpenAI provider migrated from Chat Completions to Responses API with reasoning support (o-series models),
EasyInputMessageformat, and replay of reasoning items for multi-turn continuity. (ADR-037) -
Ollama thinking support — Thinking/reasoning output from Qwen3, DeepSeek-R1, and Gemma 4 models streamed via the
thinkfield, with automatic strip for non-thinking-capable models. - Entity-centric memory (Knowledge Wiki) — Redis JSON property graph with NER-extracted entities, relationships, alias deduplication, and 9 API endpoints for CRUD, merge, and graph operations. Dashboard Knowledge page for browsing and editing. (ADR-030)
- PACT Capture Protocol — Multimodal capture protocol with presence signals, per-modality consent envelopes, jurisdiction-aware validation, and 5 API endpoints. Local-first-only v1. (ADR-029)
- Speaker Intelligence — sherpa-onnx speaker embedding extraction, diarization, and Redis persistence for multi-speaker attribution. (ADR-031)
-
Billing-aware model router — Runtime detection of billing exhaustion (Anthropic billing_error, OpenAI insufficient_quota, OpenRouter 402) with auto-fallback via BillingDetector and user-configured fallback strategies (
cloud-backup/ollama-local/none). Billing dashboard UX,GET /api/billing/status,POST /api/billing/retry/:providerId. (ADR-035) -
Three-tier health system — Health checks prioritized as critical/important/optional. Fresh installs show green when critical services are healthy.
POST /api/system/reconnect-redisfor hot-connecting Redis services without restart. (ADR-034) - Consumer onboarding — 3-step setup wizard (Mode → Redis → Done) replacing the 7-step developer flow. Just-in-time model downloads, hardware-aware recommendations, slot auto-configuration. (ADR-033)
-
App Management — Settings section with re-run setup (
POST /api/system/reset-onboarding) and reset slots to defaults (POST /api/slots/reset). - Chat keyboard shortcuts + accessibility — Enter to send, Shift+Enter for newline, Escape to cancel streaming.
-
Fallback strategy configuration —
POST /api/slots/auto-configureaccepts afallbackStrategyparameter for cloud-backup, ollama-local, or none. -
Billing exhaustion Prometheus counter —
agtos_provider_billing_exhaustion_total{provider}for monitoring billing events across providers. - Auto-connect to running Redis — Server probes for an existing Redis instance at boot before attempting to start a managed one.
Changed
- Onboarding wizard — Simplified from 7 steps to 3 steps (Mode Selection → Redis → Done). See ADR-033.
- OpenAI provider — Migrated from Chat Completions API to Responses API for reasoning support and multi-turn continuity.
Fixed
- Onboarding wizard not shown in Tauri desktop app on first launch
- Uptime display showing incorrect values (was using stale module-level timestamp)
- 503 error during first few minutes after launch (
/api/healthnow bypasses initialization gate) - OpenAI tool calls silently dropped during streaming
- OpenAI stream cancellation for voice barge-in
- Router hardcoded
provider:'claude'for all cloud decisions — now uses per-slot config - Various OAuth UI remnants removed from Settings page
- Claude multi-turn thinking — stores full content blocks, rebuilds in buildMessages
- OpenAI multi-turn reasoning — EasyInputMessage, replay reasoning items, handle incomplete
- Ollama thinking capability check for non-thinking models
- Vision images sent on all agent steps, not just first
- History fetch merge for conversation browser
- Voice session propagation with unified activeSessionId
- Auth guards and input validation on history/session endpoints
- Billing tool errors with fallback status display
- Virtualized chat messages for long conversations
- Concurrent send guard preventing duplicate messages
Removed
AGTOS_CLOUD_PROVIDERandAGTOS_CHAT_PROVIDERenv vars — replaced by per-slot provider config in Model Slot RegistryforceCloudPatternsandforceLocalPatternsrouting config — replaced byforceSlotPatterns- Two-tier LOCAL/CLOUD routing terminology