Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agtos.ai/llms.txt

Use this file to discover all available pages before exploring further.

Unreleased

Added

  • Memory Maintenance (Memory Lint)Dreamer.maintain() runs a periodic knowledge-base health sweep: stale conclusion confidence decay, Jaccard redundancy merge, orphan episode flagging, contradiction detection, and low-confidence pruning. Auto-registered cron task (default 0 3 * * * in AGTOS_MAINTENANCE_TIMEZONE, default UTC). On-demand via POST /api/memory/maintain or agtos memory maintain. memory-maintenance health check flags runs older than 48 hours. Killable with AGTOS_MAINTENANCE_ENABLED=false. (ADR-021)
  • NLI hybrid contradiction pipeline — Mandatory 3-stage pipeline for contradiction detection in Dreamer.maintain(). Stage 1 selects candidate pairs via cosine similarity over conclusion embeddings. Stage 2 runs a quantized DeBERTa-v3-base MNLI cross-encoder via onnxruntime-node with a Redis PairCache. Stage 3 sends the survivors to a batched LLM judge via the maintenance task slot. When AGTOS_NLI_ENABLED=false, Stage 2 is skipped and Stage 3 receives the full candidate list from Stage 1 (the legacy single-LLM detector has been removed). Prebuild with npm run prebuild:nli. New memory.contradiction.detected event. (ADR-027)
  • ResourceGuard — Gates every background LLM call (consolidation and maintenance) through a deterministic decision tree: policy override → cloud/remote short-circuit → active sessions → session cooldown → system load → Ollama VRAM probe (GET /api/ps). Configurable policy via AGTOS_BACKGROUND_WORK_POLICY (auto / always / idle-only). Retry-with-backoff for consolidation, skip-and-wait for maintenance. agtos_background_work_safe gauge and agtos_resource_guard_defer_count_total{reason} counter. (ADR-021)
  • Dreamer activationtriggerConsolidation() is now wired into endVoiceSession() with a server-level sessionEnded event listener as defense-in-depth. The 60+ test consolidation engine moves from dead code to production runtime — user profiles finally populate after sessions end.
  • Query-as-Ingest — New RESPONSE_INGEST episode type persists high-quality agent responses (heuristic scoring on tool calls, multi-step reasoning, length, and synthesis patterns) so synthesis compounds across sessions. Per-session rate limit and 5-second dedup window.
  • agtos memory maintain CLI command — Triggers an on-demand memory lint sweep. Exit codes: 0 (ok), 1 (request failed), 2 (RESOURCES_BUSY — retry later), 3 (PROFILE_DISCONNECTED — operator action). Supports --user <id> and --verbose.
  • maintenance task slot — 6th entry in TASK_SLOTS for the Stage 3 LLM judge in the NLI hybrid pipeline. Defaults to fallback: 'consolidation' so existing single-provider setups keep working unchanged. Dreamer resolves the maintenance provider at call time via resolveMaintenanceProvider(), so hot-swapping the slot takes effect on the next sweep without a restart.
  • ProviderCatalog interface — Cross-provider model discovery via listModels() / getAccountInfo() / validateModel(). Four implementations ship: OpenRouterCatalog, OllamaCatalog, ClaudeCatalog, OpenAICatalog. ModelInfo carries context length, max output tokens, per-1M-token pricing, and a 13-entry capability union including 'contradiction'. One-hour TTL caching. (ADR-026)
  • OpenRouter first-class provider — Promoted from “OpenAI with a different baseURL” to a full provider under src/providers/openrouter/ with its own credential scope (provider-openrouter), attribution headers (HTTP-Referer, X-Title), and rich /api/v1/models catalog with string-encoded per-token pricing. Can now be configured for any slot (conversation or task). (ADR-026)
  • Provider lifecycle events — Four canonical topics at src/core/providers/events.ts: provider.initialized, provider.failed, provider.catalog.refreshed (fires only on successful network fetch, not cache hits), and provider.credentials.updated (fires on create/rotate/delete in CredentialManager).
  • Memory maintenance history APIGET /api/memory/maintain/history lists recent reports (30-day TTL, 200-entry sorted-set index) and GET /api/memory/maintain/history/:timestamp fetches one by timestamp. Powers the dashboard Memory Browser’s maintenance widget.
  • Timezone-aware schedulerCronSchedule.timezone (IANA) with croner@^10.0.1 (zero runtime dependencies) for next-run computation. Replaces process-local time semantics. AGTOS_MAINTENANCE_TIMEZONE exposes the timezone to operators. (ADR-023)
  • Atomic profile updatesUserProfileManager uses a node-redis v5 connection pool with WATCH/MULTI/EXEC optimistic locking via withOptimisticLock<T>(). All five mutating methods retry up to 3 times on WatchError, closing the audit M3 race window where concurrent maintenance + consolidation could lose conclusions. (ADR-024)
  • Multi-tenant-ready data layer — Every Redis key includes {userId} in tenant-first position, every vector search accepts a userId filter, and every business-logic call site goes through resolveUserId(). No operational multi-user today, but a clean migration path. (ADR-025)
  • qmd MCP integrationAGTOS_MCP_SERVERS JSON env var for external MCP servers. McpClientManager.discoverTools() now runs inferCategory() on each discovered tool so search tools auto-participate in intent-based tool selection (ADR-009).
  • memory-semantic health check — Probes the RediSearch vector index document count and size via FT.INFO with a 1-second timeout. Non-throwing on Redis failures.
  • Memory lifecycle eventsmemory.conclusion.restored, memory.consolidation.deferred, and memory.maintenance.failed complete the symmetry with the existing decay / prune / completed family so dashboards can wire alerts without parsing logs.
  • New Prometheus metricsagtos_background_work_safe gauge, agtos_memory_consolidation_deferred_total{reason}, agtos_resource_guard_defer_count_total{reason}, agtos_pair_cache_lookups_total{result}, agtos_contradiction_pipeline_stage_duration_seconds{stage}, agtos_nli_inferences_total{result} (verdicts: contradiction/neutral/entailment), agtos_nli_inference_duration_seconds (latency summary with p50/p95/p99), agtos_provider_catalog_fetch_total{provider,status} (catalog refresh attempts per provider), and agtos_provider_catalog_models_count{provider} (current model count gauge per provider).
  • Model Slot Registry — named capability slots (chat, reasoning, coding, tool_calling, creative) with per-slot provider+model config and fallback chains. Replaces the global cloud provider env var. Configured in ~/.agtos/config.json. (ADR-020)
  • Encrypted credential storage — API keys encrypted with AES-256-GCM at ~/.agtos/credentials.json. scrypt key derivation (N=16384), AAD-bound ciphertext per provider, auto-generated machine secret at ~/.agtos/.secret. 149 credential-specific tests.
  • OpenAI cloud provider — GPT-4o and GPT-4o Mini as drop-in alternatives to Claude. Full streaming, tool calling, session management, and health checks. Configurable per slot. (ADR-019)
  • CLI API key validationagtos setup validates API keys against the actual provider API before saving
  • First-run detectionagtos start guides you to agtos setup when no configuration exists
  • Structured startup progressagtos start shows step-by-step service initialization with status icons
  • Doctor credential validationagtos doctor checks credential file health, permissions, API key functionality, network connectivity, and reports feature degradation
  • Setup token auth — 30-min TTL token for credential storage during onboarding (X-Setup-Token header)
  • Onboarding mic test — real-time audio level visualization during desktop app setup
  • Settings credential management — update API keys inline with validation from the Settings page
  • Credential health check — per-provider source tracking exposed via /health endpoint with Prometheus metrics
  • ProviderLifecycleManager — Runtime provider hot-swap without server restart. Credential rotation triggers provider.credentials.updated event, and the lifecycle manager atomically swaps the client provider instance. In-flight requests complete on the old client; new requests use the new credentials. Per-provider health checks (provider-claude, provider-openai, provider-ollama, provider-openrouter) report credential status, catalog freshness, and staleness.
  • PUT /api/slots model validationPUT /api/slots now validates each slot’s model against the ProviderCatalog. Unknown models produce a warning but are allowed (private/unlisted models). Models with past deprecation dates are blocked with HTTP 400. Models with future deprecation dates produce a warning. Catalog fetch failures are non-blocking. Response includes a warnings array when applicable.
  • 6 Memory V2 / Provider Catalog config keys formalizednliModelSize (AGTOS_NLI_MODEL_SIZE), nliContradictionThreshold (AGTOS_NLI_CONTRADICTION_THRESHOLD), pairCacheTtlSeconds (AGTOS_PAIR_CACHE_TTL_SECONDS), providerCatalogCacheTtlSeconds (AGTOS_PROVIDER_CATALOG_CACHE_TTL_SECONDS), and providerCatalogAutoRefresh (AGTOS_PROVIDER_CATALOG_AUTO_REFRESH) are now registered in CONFIG_KEY_META with typed schemas, validation ranges, and reload-type annotations. All are discoverable via the dashboard Settings page and GET /api/settings.
  • Tauri ORT sidecar bundling — Desktop builds bundle ONNX Runtime binaries via scripts/copy-ort-binaries.mjs. The Rust sidecar sets AGTOS_ORT_RUNTIME_DIR so the NLI pipeline resolves onnxruntime-node bindings from the packaged app. NLI works out-of-box in desktop builds when AGTOS_NLI_ENABLED=true.
  • OpenRouter embedding provider — First-class OpenRouterEmbeddingProvider with its own credential scope (provider-openrouter), deferred credential failure (constructor doesn’t throw), and request-level retry with exponential backoff (3 attempts, 250ms initial, 2× multiplier). Retries on HTTP 429 and 5xx; fails fast on 400/401/403/404.

Changed

  • NLI contradiction pipeline mandatoryContradictionPipelineDeps is now a required constructor argument on MemoryDreamer. The legacy single-LLM detectContradictionsLlm() fall-through and its buildMaintenancePrompt helper have been deleted. When AGTOS_NLI_ENABLED=false, Stage 2 (cross-encoder) is skipped and Stage 3 (LLM judge) receives the full candidate list from Stage 1. MaintenanceReport.summary.contradictionPipeline is now a required field (short-circuit paths produce zero-valued metrics).
  • triggerMaintenance() return shape — now returns a discriminated MaintenanceTriggerResult ({ok:true, report} or {ok:false, errorCode: 'PROFILE_DISCONNECTED' | 'RESOURCES_BUSY', message, reason?}) instead of MaintenanceReport | null. POST /api/memory/maintain 503 responses now carry errorCode so clients can distinguish transient (retry) from persistent (operator action) failures without parsing message strings.
  • Zod 4 upgrade[email protected] is the baseline. Removed all as never casts and @ts-ignore TS2589 suppressions from MCP tool registrations. z.record() now requires both a key and a value schema (e.g., z.record(z.string(), z.unknown())). No user-facing behavior change. (ADR-022)
  • Scheduler library swap — replaced node-cron@4 + the custom cron-parser implementation with croner@^10.0.1 (zero runtime dependencies, native IANA timezone and DST handling). Cron expressions that previously relied on the process-local timezone now fire in UTC by default; set AGTOS_MAINTENANCE_TIMEZONE to opt in to a different IANA zone. (ADR-023)
  • Consolidation / maintenance providers — now resolve via the consolidation and maintenance task slots in ~/.agtos/config.json. Legacy AGTOS_CONSOLIDATION_PROVIDER / AGTOS_CONSOLIDATION_MODEL env vars still work as fallbacks.
  • Model router v2.0 — replaced two-tier LOCAL/CLOUD dispatch with slot-based routing via Model Slot Registry. AGTOS_CLOUD_PROVIDER and AGTOS_CHAT_PROVIDER env vars removed in favor of per-slot config.
  • Auth overhaul — removed OAuth support. Authentication is now API key only for all providers. ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN env vars are no longer recognized.
  • Setup wizard — now writes encrypted credentials to ~/.agtos/credentials.json and slot configuration to ~/.agtos/config.json. Migrates plaintext keys from .env.local.
  • Key derivation — migrated from PBKDF2-SHA256 to scrypt (N=16384, r=8, p=1). Auto-migrates existing PBKDF2 credential files on first access.
  • CLI transport restructured — opt-in via AGTOS_CLAUDE_TRANSPORT=cli. Passes ANTHROPIC_API_KEY to the subprocess instead of OAuth tokens.
  • Default speech engine — changed from speaches to sherpa-onnx (backward-compatible env var fallback)
  • /api/credentials — removed from auth-exempt paths. Now requires API key or setup token.
  • Desktop Chat UI — SSE streaming text chat (POST /api/chat/stream) with rAF token batching, deferred markdown rendering via react-markdown + remark-gfm, and tool call visualization inline. (ADR-036)
  • Provider-agnostic thinking/reasoning — Unified thinking/reasoning output across Claude (extended thinking), OpenAI (reasoning summary via Responses API), Ollama (Qwen3/DeepSeek-R1/Gemma 4 think tags), and OpenRouter (message.reasoning). Multi-turn reasoning continuity preserved via stored thinking tokens. (ADR-037)
  • Image/vision support in chat — Paste or drag-and-drop images into the chat. Per-provider formatting (Claude image blocks, OpenAI image_url, Ollama images array). Supported on all providers with vision capability. (ADR-037)
  • Syntax highlighting — Code blocks in chat responses rendered with react-shiki for accurate language-specific highlighting.
  • Conversation history persistenceGET /api/chat/history/:sessionId retrieves past messages. Dashboard Conversations page provides a browser with session resume capability.
  • Unified voice + chat session — App-level activeSessionId shared between voice and chat interfaces. Voice and text interactions contribute to the same session context.
  • OpenAI Responses API migration — OpenAI provider migrated from Chat Completions to Responses API with reasoning support (o-series models), EasyInputMessage format, and replay of reasoning items for multi-turn continuity. (ADR-037)
  • Ollama thinking support — Thinking/reasoning output from Qwen3, DeepSeek-R1, and Gemma 4 models streamed via the think field, with automatic strip for non-thinking-capable models.
  • Entity-centric memory (Knowledge Wiki) — Redis JSON property graph with NER-extracted entities, relationships, alias deduplication, and 9 API endpoints for CRUD, merge, and graph operations. Dashboard Knowledge page for browsing and editing. (ADR-030)
  • PACT Capture Protocol — Multimodal capture protocol with presence signals, per-modality consent envelopes, jurisdiction-aware validation, and 5 API endpoints. Local-first-only v1. (ADR-029)
  • Speaker Intelligence — sherpa-onnx speaker embedding extraction, diarization, and Redis persistence for multi-speaker attribution. (ADR-031)
  • Billing-aware model router — Runtime detection of billing exhaustion (Anthropic billing_error, OpenAI insufficient_quota, OpenRouter 402) with auto-fallback via BillingDetector and user-configured fallback strategies (cloud-backup/ollama-local/none). Billing dashboard UX, GET /api/billing/status, POST /api/billing/retry/:providerId. (ADR-035)
  • Three-tier health system — Health checks prioritized as critical/important/optional. Fresh installs show green when critical services are healthy. POST /api/system/reconnect-redis for hot-connecting Redis services without restart. (ADR-034)
  • Consumer onboarding — 3-step setup wizard (Mode → Redis → Done) replacing the 7-step developer flow. Just-in-time model downloads, hardware-aware recommendations, slot auto-configuration. (ADR-033)
  • App Management — Settings section with re-run setup (POST /api/system/reset-onboarding) and reset slots to defaults (POST /api/slots/reset).
  • Chat keyboard shortcuts + accessibility — Enter to send, Shift+Enter for newline, Escape to cancel streaming.
  • Fallback strategy configurationPOST /api/slots/auto-configure accepts a fallbackStrategy parameter for cloud-backup, ollama-local, or none.
  • Billing exhaustion Prometheus counteragtos_provider_billing_exhaustion_total{provider} for monitoring billing events across providers.
  • Auto-connect to running Redis — Server probes for an existing Redis instance at boot before attempting to start a managed one.

Changed

  • Onboarding wizard — Simplified from 7 steps to 3 steps (Mode Selection → Redis → Done). See ADR-033.
  • OpenAI provider — Migrated from Chat Completions API to Responses API for reasoning support and multi-turn continuity.

Fixed

  • Onboarding wizard not shown in Tauri desktop app on first launch
  • Uptime display showing incorrect values (was using stale module-level timestamp)
  • 503 error during first few minutes after launch (/api/health now bypasses initialization gate)
  • OpenAI tool calls silently dropped during streaming
  • OpenAI stream cancellation for voice barge-in
  • Router hardcoded provider:'claude' for all cloud decisions — now uses per-slot config
  • Various OAuth UI remnants removed from Settings page
  • Claude multi-turn thinking — stores full content blocks, rebuilds in buildMessages
  • OpenAI multi-turn reasoning — EasyInputMessage, replay reasoning items, handle incomplete
  • Ollama thinking capability check for non-thinking models
  • Vision images sent on all agent steps, not just first
  • History fetch merge for conversation browser
  • Voice session propagation with unified activeSessionId
  • Auth guards and input validation on history/session endpoints
  • Billing tool errors with fallback status display
  • Virtualized chat messages for long conversations
  • Concurrent send guard preventing duplicate messages

Removed

  • AGTOS_CLOUD_PROVIDER and AGTOS_CHAT_PROVIDER env vars — replaced by per-slot provider config in Model Slot Registry
  • forceCloudPatterns and forceLocalPatterns routing config — replaced by forceSlotPatterns
  • Two-tier LOCAL/CLOUD routing terminology