Why Memory Matters
Without memory, a voice agent is stateless. It asks the same clarifying questions every session, cannot reference previous conversations, and feels impersonal. The memory system solves this by providing three layers of recall, each serving a different purpose.Memory Tiers
Working Memory
Current session contextRecent conversation turns, active tool results, current task state. Always available — no external dependencies.
Episodic Memory
Cross-session recallConversation summaries, extracted facts, user corrections. Stored in Redis with configurable TTL.
Semantic Memory
Long-term knowledgeUser preferences, learned facts, entity relationships. Embedding-based vector search for semantic retrieval.
Working Memory
Working memory is the conversation context available to the LLM during the current session. It lives directly in the LLM’s context window. Scope: Current session only Storage: In-process (no external dependencies) Contents: Recent conversation turns, active tool results, current task state Working memory is managed by the session manager and passed in themessages array to the LLM. When the conversation grows long, automatic summarization compresses older turns into a summary, keeping the context window focused on recent and relevant information.
Episodic Memory
Episodic memory preserves conversation knowledge after a session ends. When a session completes, the system uses an LLM to summarize the conversation and extract key facts, then stores these in Redis. Scope: Retained across sessions Storage: Redis with TTL-based expiration (default 30 days) Contents: Conversation summaries, extracted facts, user corrections, task outcomesHow Memories Are Saved
Not every conversation is worth remembering. The episodic memory system uses heuristic save decisions to determine what to persist:- Conversations with user corrections or preferences are always saved
- Task completions and their outcomes are saved
- Short, trivial exchanges (greetings, single-turn lookups) may be skipped
- An importance score (0.0-1.0) is assigned to each episode
Querying Episodes
Episodic memories can be retrieved by recency or keyword search:Semantic Memory
Semantic memory provides long-term knowledge storage with embedding-based vector search. Unlike episodic memory (which stores summaries of specific conversations), semantic memory stores distilled facts, preferences, and patterns that persist indefinitely. Scope: Cross-session, permanent until explicitly deleted Storage: Redis Vector Search (HNSW indexing) Contents: User preferences, learned facts, entity relationships, behavioral patternsHow It Works
- Embedding generation: When a memory is stored, its text is converted to a dense vector using an embedding model
- Vector indexing: The embedding is stored in Redis Vector Search with HNSW indexing for fast approximate nearest-neighbor search
- Semantic retrieval: When the agent needs context, the current query is embedded and compared against stored memories using cosine similarity
- Relevance threshold: Only memories above a configurable similarity threshold are included in the LLM context
Embedding Providers
| Provider | Dimensions | Use Case |
|---|---|---|
| Ollama (nomic-embed-text) | 768 | Local/private, no external API calls |
| MiniLM-L6 | 384 | Lightweight local option |
| OpenAI text-embedding-3-small | 1536 | Highest quality, requires API key |
Semantic memory requires both Redis (with the RediSearch module) and an embedding provider (Ollama or OpenAI). If either is unavailable, semantic memory falls back to episodic keyword search.
Context Assembly
Before each LLM call, the Memory Coordinator assembles a context packet that combines all three memory tiers:Episodic Recall
Relevant session summaries are retrieved from Redis based on semantic similarity to the current query.
Semantic Recall
Relevant long-term memories are retrieved from the vector store based on embedding similarity.
Memory Coordinator
TheMemoryCoordinator is the unified facade for the three-tier system. It provides a single entry point for the orchestrator and agent loop, handling the complexity of coordinating across tiers.
Graceful degradation is built in:
- Working memory (in-process) always works — no external dependencies
- Episodic memory (Redis) is optional — logs warnings if unavailable
- Semantic memory (Redis + Ollama) is optional — falls back to episodic keyword search
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET | /api/memory/episodes | List recent episodic memories |
GET | /api/memory/search?q=... | Semantic + keyword search across episodes |
Query Parameters
GET /api/memory/episodes| Parameter | Type | Default | Description |
|---|---|---|---|
limit | number | 20 | Max results (1-100) |
| Parameter | Type | Default | Description |
|---|---|---|---|
q | string | — | Required. Search query text |
limit | number | 10 | Max results (1-50) |
Configuration
Working memory configuration
Working memory configuration
Working memory settings control how conversation history is managed within a session:When conversation length exceeds
max_turns, older turns are summarized by the LLM into a condensed context, keeping the working memory focused and within token limits.Privacy and Data Control
The memory system includes privacy controls:- Explicit deletion: Memories can be removed via the
forget()protocol method - TTL expiration: Episodic memories expire after a configurable period (default 30 days)
- Per-user isolation: Memories are scoped to individual users via device-to-user mapping
- User preferences: Privacy settings allow users to opt out of memory persistence entirely