Three-Tier Memory

Agents That Remember
Everything

Most AI tools are stateless. Ask twice, get the same blank stare. EnGenAI agents have three layers of memory — instant, semantic, and permanent — that compound across every session.

The Stateless AI Problem

Every session with a typical AI agent is a fresh start. No accumulated knowledge. No compound advantage. Just endless context re-establishment.

Session 1

Agent learns your stack, patterns, and decisions over 45 minutes of back-and-forth.

Session ends

Context window closes. Everything the agent learned evaporates into nothing.

Session 2: Day 1 again

Agent asks the same questions. Learns the same things. Zero compound advantage.

The average developer spends 30–40% of every AI session re-establishing context that was known in the previous session. With stateless AI, every conversation starts at zero. That's not intelligence — that's expensive amnesia.

Three-Tier Memory Architecture

Three layers working in concert. Each tier serves a different speed and scope. First hit wins — so queries are always answered at the fastest available tier.

Tier 1 — Working Memory

Redis

Active session context. Fastest retrieval. Ephemeral.

~1ms
latency
Scope
Current session
Capacity
~100KB / agent
Persistence
Session lifetime
Access
Key-value lookup

What's stored

Current task contextRecent messagesActive tool callsAgent state flagsSession metadata
Relative capacity 100KB
cache miss → query next tier

Tier 2 — Vector Memory

Milvus

Semantic knowledge. Cross-session. Similarity search.

~50ms
latency
Scope
Cross-session
Capacity
Unlimited
Search type
Similarity (ANN)
How it works
Text → embeddings

What's stored

Semantic knowledgePast decisionsCode patternsArchitecture contextResearch findingsADR summaries
Relative capacity Configurable
cache miss → query next tier

Tier 3 — Durable Memory

PostgreSQL

Permanent storage. Cross-project. Structured relationships.

~20ms
latency
Scope
Cross-project
Capacity
Unlimited
Persistence
Permanent
Structure
Relational + JSONB

What's stored

Structured dataEntity relationshipsAudit logExplicit memoriesProject historyAgent decisions
Relative capacity Unlimited

The Memory Query Waterfall

Every knowledge query flows through the tiers in order: fastest first. Redis at 1ms. Milvus at 50ms. PostgreSQL at 20ms. First hit returns the result. Cache is warmed on a miss so the next query is faster.

"How does our auth middleware work?"
Tier 1
Redis
Checked first — fastest
1ms
HIT
Tier 2
Milvus
On miss: semantic search
50ms
SKIP
Tier 3
PostgreSQL
On miss: structured lookup
20ms
SKIP
Result returned from first hit ~1ms total

Trace a Query Live

Watch a real query flow through the memory system. Choose "Simulate HIT" to see a cached result returned in milliseconds, or "Simulate MISS" to watch it fall through to vector search and retrieve relevant documents.

Choose a scenario to trace the query through the memory system

Without Memory vs With Memory

The difference between an agent that forgets everything and one that builds on every session.

Without Memory

Stateless AI — every session starts from zero

Session 1 Monday

You:

"Build an API endpoint"

Agent:

"Sure! What framework are you using? What database? What's the auth mechanism? What's the endpoint path? What response format?"

5 questions before starting.

Session 2 Wednesday

You:

"Add tests to the endpoint"

Agent:

"What endpoint? What framework? What test library? Where are the existing tests? What's the expected response schema?"

Same blank stare. Remembered nothing.

Pattern: constant context re-establishment
Every session = Day 1 No compound advantage Endless setup cost

With 3-Tier Memory

Compound advantage — every session builds on all previous

Session 1 Monday

You:

"Build an API endpoint"

Agent:

"On it. Loading from vector memory: FastAPI + PostgreSQL, RS256 auth (architecture decision), existing pattern in auth_router.py."

Redis: active ctx Milvus: stack knowledge
Session 2 Wednesday

You:

"Add tests to the endpoint"

Agent:

"Retrieved endpoint code from durable memory. Loading test patterns from Milvus (pytest + AsyncClient, conftest fixtures). Writing tests now."

PostgreSQL: endpoint code Milvus: test patterns
Pattern: compound advantage
Each session smarter Zero context setup Knowledge compounds

The Compound Advantage

Memory doesn't just help in the moment. It compounds. Each session makes every future session faster and smarter.

W1

Week 1 — Foundations

Agents learn your framework choices, coding patterns, and architectural decisions. Everything goes into vector memory and durable storage.

M2

Month 2 — Auth & Security

Agents already know your preferred auth patterns from week one. Earlier architecture decisions inform the security implementation. No re-learning. Faster execution.

M6

Month 6 — Deep Expertise

Agents know your codebase better than most human engineers would after six months. They anticipate edge cases, reference past decisions, and produce first-draft code that passes review 80% of the time.

"Memory compounds. After six months, agents know your codebase better than most human engineers would in the same period."

Latest Release

Observational Memory

As conversations grow long, context windows fill up — and token costs compound. EnGenAI's Observer service automatically compresses long conversations into structured observations, preserving knowledge without the token overhead.

Automatic Compression Pipeline

Long Conversation
> 30K tokens
Observer Service
async, background
Structured Observations
JSONB in PostgreSQL
Future Context
retrieved at session start

No configuration required. The Observer runs in the background via a scheduled task runner. Compression triggers automatically when conversation size crosses the threshold.

1
Phase 1 — Per-Turn Extraction

After each agent turn, the Observer extracts key facts, decisions, and entities from the message into JSONB observations. Runs asynchronously — zero latency impact on the conversation.

2
Phase 2 — Background Compression

When a conversation exceeds the token threshold, the background task scheduler triggers a compression job that collapses the full history into the accumulated observations. Future sessions load the compressed form instead of the raw transcript.

What Gets Preserved

Decisions made

"Chose PostgreSQL over MongoDB for ACID guarantees"

Entities identified

"User: Sarah (Legal) — approves all contract write operations"

Constraints surfaced

"Must not call external payment APIs without compliance sign-off"

~1ms

working memory (Redis)

~50ms

semantic search (Milvus)

3

tiers always in sync

Next: Knowledge Pipeline

Memory stores what agents know. The knowledge pipeline controls how they acquire it — through research, caching, and retrieval-augmented generation.