Your agents read documentation, study APIs, and understand your codebase once. That knowledge is cached, vectorized, and available to every agent, in every future session, across every project.
What agents do without a knowledge pipeline
Re-read the FastAPI documentation every session, every sprint
Re-study your ADRs and architecture decisions from scratch each time
Re-research third-party APIs that were already understood last sprint
Pay for the same input tokens, the same embedding calls, over and over
The cost is compounding
As your project grows, the research surface grows too. More APIs. More ADRs. More codebase to understand. Without a knowledge pipeline, every new session starts from zero — and every session pays the full research cost.
At scale, teams are paying for the same research hundreds of times across hundreds of sessions. It's the hidden tax on AI development that nobody talks about.
EnGenAI's knowledge pipeline eliminates the research tax. Research is done once, stored permanently, and available instantly to every agent across every future session.
Retrieval-Augmented Generation: every query is embedded, similarity-searched against a vector database, and top results are injected into the prompt. Works well, but every query pays the full cost.
Every query: embed → search → inject
Cost: ~50ms + embedding API call
No reuse of previous identical queries
Cache-Augmented Generation sits in front of RAG. If we've seen a semantically similar query before, we serve the cached result in 5ms. Only on a true miss do we fall back to the full RAG pipeline.
Cache hit (70%): serve result in 5ms
Cache miss (30%): full RAG in ~55ms
Average: 5×0.7 + 55×0.3 = 20ms
Two paths: a fast cache hit at 5ms, or a full RAG retrieval at ~55ms that gets cached for next time. Every miss makes future queries faster.
STEP 0
New Query from Agent
STEP 1 — Cache-Augmented Generation
CAG: Check Cache
Cache key: cag:{agent_id}:{sha256(query)[:16]}
Backend: Redis
TTL: 3600s (1 hour, configurable)
Hit latency: ~5ms
STEP 1a
Cache Hit Detected
Semantically similar query found. Cached result retrieved from Redis in <5ms.
STEP 1b
Inject into Prompt
Cached context injected directly into the agent's working prompt. No embedding, no search, no LLM call for retrieval.
Done — total: ~5ms
No additional API calls needed
STEP 2 — RAG
Vector Search in Milvus
1. Query → embedding (text-embedding-3-large)
2. Cosine similarity search (top-5)
3. Vector DB: Milvus (~50ms)
STEP 3
Results Injected into Prompt
Top-5 semantic matches injected as context. Relevance threshold: 0.75 cosine similarity minimum.
STEP 4 — Back to CAG
Result Cached for Future Hits
The RAG result is cached in Redis. Next similar query: 5ms hit instead of 55ms miss.
The Math
70%
cache hit rate
after warmup period
63%
latency reduction
vs. pure RAG on every query
70%
cost reduction
on repeated research queries
70% cache hit rate × (55ms - 5ms) saved per query = 63% average latency reduction. Cost savings equivalent to skipping embedding + vector search on 70% of all queries.
The cache hit rate grows over time as more research is done. Cross-project reuse was the largest single jump.
cache hit rate
across all agents
average hit latency
Redis cache read, no embedding required
average latency reduction
vs. pure RAG on every query
Cache Hit Rate Over Time
Measured across production EnGenAI usage
Hit rate grows as agents accumulate research. Cross-project reuse was the single largest improvement: FastAPI patterns learned by Sophi in one project are immediately available in new projects.
Powered by Memgraph. Agents can traverse relationships, not just search keywords. They understand how concepts relate, not just what they contain.
Powered by Memgraph (graph database). Agents can traverse relationships, not just search keywords. They understand that the memory architecture informs Sophi's storage strategy, which in turn shapes how PROMI orchestrates context injection. Graph traversal surfaces these connections that keyword search cannot.
Knowledge doesn't stay locked in one project
The FastAPI patterns Sophi learned in one project are immediately available in the next. The authentication patterns from one customer's project can inform another. The knowledge graph spans your entire organisation — not just the current session.
With permission controls
Knowledge is shared within your organisation by default. Cross-organisation sharing is opt-in and requires explicit approval. Sensitive knowledge (credentials, internal architecture) is excluded from cross-project indices automatically.
Same organisation
Shared by default
Cross-organisation
Opt-in, admin approved
Sensitive data
Auto-excluded, always
cache hit rate
Across all agents in production
hit latency
Redis read, no embedding required
project by default
Research spans all your projects
Knowledge helps agents understand. Testing proves they built correctly. See how EnGenAI deploys to real infrastructure for production-grade feedback.