Three-Tier Memory

Agents That Remember
Everything

Most AI tools are stateless. Ask twice, get the same blank stare. EnGenAI agents have three layers of memory — instant, semantic, and permanent — that compound across every session.

The Stateless AI Problem

Every session with a typical AI agent is a fresh start. No accumulated knowledge. No compound advantage. Just endless context re-establishment.

Session 1

Agent learns your stack, patterns, and decisions over 45 minutes of back-and-forth.

Session ends

Context window closes. Everything the agent learned evaporates into nothing.

Session 2: Day 1 again

Agent asks the same questions. Learns the same things. Zero compound advantage.

The average developer spends 30–40% of every AI session re-establishing context that was known in the previous session. With stateless AI, every conversation starts at zero. That's not intelligence — that's expensive amnesia.

Three-Tier Memory Architecture

Three layers working in concert. Each tier serves a different speed and scope. First hit wins — so queries are always answered at the fastest available tier.

Tier 1 — Working Memory

Redis

Active session context. Fastest retrieval. Ephemeral.

~1ms

latency

Scope

Current session

Capacity

~100KB / agent

Persistence

Session lifetime

Access

Key-value lookup

What's stored

Current task contextRecent messagesActive tool callsAgent state flagsSession metadata

Relative capacity 100KB

cache miss → query next tier

Tier 2 — Vector Memory

Milvus

Semantic knowledge. Cross-session. Similarity search.

~50ms

latency

Scope

Cross-session

Capacity

Unlimited

Search type

Similarity (ANN)

How it works

Text → embeddings

What's stored

Semantic knowledgePast decisionsCode patternsArchitecture contextResearch findingsADR summaries

Relative capacity Configurable

cache miss → query next tier

Tier 3 — Durable Memory

PostgreSQL

Permanent storage. Cross-project. Structured relationships.

~20ms

latency

Scope

Cross-project

Capacity

Unlimited

Persistence

Permanent

Structure

Relational + JSONB

What's stored

Structured dataEntity relationshipsAudit logExplicit memoriesProject historyAgent decisions

Relative capacity Unlimited

The Memory Query Waterfall

Every knowledge query flows through the tiers in order: fastest first. Redis at 1ms. Milvus at 50ms. PostgreSQL at 20ms. First hit returns the result. Cache is warmed on a miss so the next query is faster.

"How does our auth middleware work?"

Tier 1

Redis

Checked first — fastest

1ms

HIT

Tier 2

Milvus

On miss: semantic search

50ms

SKIP

Tier 3

PostgreSQL

On miss: structured lookup

20ms

SKIP

Result returned from first hit ~1ms total

Trace a Query Live

Watch a real query flow through the memory system. Choose "Simulate HIT" to see a cached result returned in milliseconds, or "Simulate MISS" to watch it fall through to vector search and retrieve relevant documents.

Query

Choose a scenario to trace the query through the memory system

Without Memory vs With Memory

The difference between an agent that forgets everything and one that builds on every session.

Without Memory

Stateless AI — every session starts from zero

Session 1 Monday

You:

"Build an API endpoint"

Agent:

"Sure! What framework are you using? What database? What's the auth mechanism? What's the endpoint path? What response format?"

5 questions before starting.

Session 2 Wednesday

You:

"Add tests to the endpoint"

Agent:

"What endpoint? What framework? What test library? Where are the existing tests? What's the expected response schema?"

Same blank stare. Remembered nothing.

Pattern: constant context re-establishment

Every session = Day 1 No compound advantage Endless setup cost

With 3-Tier Memory

Compound advantage — every session builds on all previous

Session 1 Monday

You:

"Build an API endpoint"

Agent:

"On it. Loading from vector memory: FastAPI + PostgreSQL, RS256 auth (architecture decision), existing pattern in auth_router.py."

Redis: active ctx Milvus: stack knowledge

Session 2 Wednesday

You:

"Add tests to the endpoint"

Agent:

"Retrieved endpoint code from durable memory. Loading test patterns from Milvus (pytest + AsyncClient, conftest fixtures). Writing tests now."

PostgreSQL: endpoint code Milvus: test patterns

Pattern: compound advantage

Each session smarter Zero context setup Knowledge compounds

The Compound Advantage

Memory doesn't just help in the moment. It compounds. Each session makes every future session faster and smarter.

Week 1 — Foundations

Agents learn your framework choices, coding patterns, and architectural decisions. Everything goes into vector memory and durable storage.

Month 2 — Auth & Security

Agents already know your preferred auth patterns from week one. Earlier architecture decisions inform the security implementation. No re-learning. Faster execution.

Month 6 — Deep Expertise

Agents know your codebase better than most human engineers would after six months. They anticipate edge cases, reference past decisions, and produce first-draft code that passes review 80% of the time.

"Memory compounds. After six months, agents know your codebase better than most human engineers would in the same period."

Latest Release

Observational Memory

As conversations grow long, context windows fill up — and token costs compound. EnGenAI's Observer service automatically compresses long conversations into structured observations, preserving knowledge without the token overhead.

Automatic Compression Pipeline

Long Conversation

> 30K tokens

→

Observer Service

async, background

→

Structured Observations

JSONB in PostgreSQL

→

Future Context

retrieved at session start

No configuration required. The Observer runs in the background via a scheduled task runner. Compression triggers automatically when conversation size crosses the threshold.

Phase 1 — Per-Turn Extraction

After each agent turn, the Observer extracts key facts, decisions, and entities from the message into JSONB observations. Runs asynchronously — zero latency impact on the conversation.

Phase 2 — Background Compression

When a conversation exceeds the token threshold, the background task scheduler triggers a compression job that collapses the full history into the accumulated observations. Future sessions load the compressed form instead of the raw transcript.

What Gets Preserved

Decisions made

"Chose PostgreSQL over MongoDB for ACID guarantees"

Entities identified

"User: Sarah (Legal) — approves all contract write operations"

Constraints surfaced

"Must not call external payment APIs without compliance sign-off"

~1ms

working memory (Redis)

~50ms

semantic search (Milvus)

tiers always in sync

Next: Knowledge Pipeline

Memory stores what agents know. The knowledge pipeline controls how they acquire it — through research, caching, and retrieval-augmented generation.

Knowledge Pipeline → Register for Early Access →

Agents That Remember Everything

The Stateless AI Problem

Session 1

Session ends

Session 2: Day 1 again

Three-Tier Memory Architecture

Tier 1 — Working Memory

Tier 2 — Vector Memory

Tier 3 — Durable Memory

The Memory Query Waterfall

Trace a Query Live

Without Memory vs With Memory

Without Memory

With 3-Tier Memory

The Compound Advantage

Week 1 — Foundations

Month 2 — Auth & Security

Month 6 — Deep Expertise

Observational Memory

Next: Knowledge Pipeline

Agents That Remember
Everything