Technology Deep Dive — Section 9

Research Once.
Forever Accessible.

Your agents read documentation, study APIs, and understand your codebase once. That knowledge is cached, vectorized, and available to every agent, in every future session, across every project.

70% cache hit rate 5ms hit latency Cross-project by default

The Research Tax

What agents do without a knowledge pipeline

Re-read the FastAPI documentation every session, every sprint

Re-study your ADRs and architecture decisions from scratch each time

Re-research third-party APIs that were already understood last sprint

Pay for the same input tokens, the same embedding calls, over and over

The cost is compounding

As your project grows, the research surface grows too. More APIs. More ADRs. More codebase to understand. Without a knowledge pipeline, every new session starts from zero — and every session pays the full research cost.

At scale, teams are paying for the same research hundreds of times across hundreds of sessions. It's the hidden tax on AI development that nobody talks about.

EnGenAI's knowledge pipeline eliminates the research tax. Research is done once, stored permanently, and available instantly to every agent across every future session.

CAG Before RAG

RAG Only
(most platforms)

Retrieval-Augmented Generation: every query is embedded, similarity-searched against a vector database, and top results are injected into the prompt. Works well, but every query pays the full cost.

Every query: embed → search → inject
Cost: ~50ms + embedding API call
No reuse of previous identical queries

CAG + RAG
(EnGenAI)

Cache-Augmented Generation sits in front of RAG. If we've seen a semantically similar query before, we serve the cached result in 5ms. Only on a true miss do we fall back to the full RAG pipeline.

Cache hit (70%): serve result in 5ms
Cache miss (30%): full RAG in ~55ms
Average: 5×0.7 + 55×0.3 = 20ms

The Pipeline

Two paths: a fast cache hit at 5ms, or a full RAG retrieval at ~55ms that gets cached for next time. Every miss makes future queries faster.

Cache HIT path (fast)
Cache MISS path (slower but still fast)

STEP 0

New Query from Agent

STEP 1 — Cache-Augmented Generation

CAG: Check Cache

Cache key: cag:{agent_id}:{sha256(query)[:16]}

Backend: Redis

TTL: 3600s (1 hour, configurable)

Hit latency: ~5ms

HIT PATH (5ms)

STEP 1a

Cache Hit Detected

Semantically similar query found. Cached result retrieved from Redis in <5ms.

STEP 1b

Inject into Prompt

Cached context injected directly into the agent's working prompt. No embedding, no search, no LLM call for retrieval.

Done — total: ~5ms

No additional API calls needed

MISS PATH (~55ms)

STEP 2 — RAG

Vector Search in Milvus

1. Query → embedding (text-embedding-3-large)

2. Cosine similarity search (top-5)

3. Vector DB: Milvus (~50ms)

STEP 3

Results Injected into Prompt

Top-5 semantic matches injected as context. Relevance threshold: 0.75 cosine similarity minimum.

STEP 4 — Back to CAG

Result Cached for Future Hits

The RAG result is cached in Redis. Next similar query: 5ms hit instead of 55ms miss.

The Math

70%

cache hit rate

after warmup period

63%

latency reduction

vs. pure RAG on every query

70%

cost reduction

on repeated research queries

70% cache hit rate × (55ms - 5ms) saved per query = 63% average latency reduction. Cost savings equivalent to skipping embedding + vector search on 70% of all queries.

Performance

The cache hit rate grows over time as more research is done. Cross-project reuse was the largest single jump.

70%

cache hit rate

across all agents

5ms

average hit latency

Redis cache read, no embedding required

63%

average latency reduction

vs. pure RAG on every query

Cache Hit Rate Over Time

Measured across production EnGenAI usage

Week 1
0%
no cache yet
Month 1
20%
cache introduced
Month 3
45%
research reuse grows
Month 4
63%
knowledge graph added
Month 6
70%
cross-project reuse

Hit rate grows as agents accumulate research. Cross-project reuse was the single largest improvement: FastAPI patterns learned by Sophi in one project are immediately available in new projects.

The Knowledge Graph

Powered by Memgraph. Agents can traverse relationships, not just search keywords. They understand how concepts relate, not just what they contain.

Memory strategy Steering Auth Keith PROMI Sophi Marv Sage FastAPI patterns React patterns Migration patterns Phase Infra Phase Platform Phase Enterprise LLM docs GKE docs PG guides Architecture Decisions Agent Knowledge Code Patterns Project History External Research EnGenAI Project
Architecture Decisions
Agent Knowledge
Code Patterns
Project History
External Research

Powered by Memgraph (graph database). Agents can traverse relationships, not just search keywords. They understand that the memory architecture informs Sophi's storage strategy, which in turn shapes how PROMI orchestrates context injection. Graph traversal surfaces these connections that keyword search cannot.

Cross-Project Reuse

Knowledge doesn't stay locked in one project

The FastAPI patterns Sophi learned in one project are immediately available in the next. The authentication patterns from one customer's project can inform another. The knowledge graph spans your entire organisation — not just the current session.

With permission controls

Knowledge is shared within your organisation by default. Cross-organisation sharing is opt-in and requires explicit approval. Sensitive knowledge (credentials, internal architecture) is excluded from cross-project indices automatically.

Same organisation

Shared by default

Cross-organisation

Opt-in, admin approved

Sensitive data

Auto-excluded, always

70%

cache hit rate

Across all agents in production

5ms

hit latency

Redis read, no embedding required

Cross

project by default

Research spans all your projects

Next: Live Testing

Knowledge helps agents understand. Testing proves they built correctly. See how EnGenAI deploys to real infrastructure for production-grade feedback.