Testing Infrastructure

Real Infrastructure.
Real Results.

No mocking. No fake environments. EnGenAI deploys code to actual GKE clusters for testing. If it breaks in production, it breaks in the sandbox first.

The Mock Testing Problem

Mocked tests pass. Production fails. The classic scenario: your mock returns what you told it to return — not what the real system does.

Race Conditions

Mocks execute synchronously. Real systems have network latency, concurrent writes, and transaction lock contention. Your mock never sees any of it.

Auth Edge Cases

Mocking auth means mocking your own assumptions. Token expiry during long requests, RBAC enforcement at the database level — invisible to mocked tests.

Real Network Latency

Timeout thresholds that work in mocks fail under real load. Resource limits that look fine locally hit hard in Kubernetes. Only real infrastructure reveals them.

"A mock that passes is only as good as the assumptions you built it on."

How Live Testing Works

Push code → Docker build → GCR push → ArgoCD sync → Live endpoint ready. Under 5 minutes. Every time.

Developer / Agent

pushes feature branch

push code

GitHub

feature branch

CI trigger

GitHub Actions

runs CI pipeline

Build Docker image

→ Push to GCR (Artifact Registry)

ArgoCD

syncs manifest → deploys

GKE Sandbox Cluster

Pod: engenai-app

test version

Pod: engenai-backend

test version

Pod: PostgreSQL

ephemeral test DB

Live Endpoint Ready

https://sandbox-{id}.dev.engenai.app

Run Integration Tests

against real endpoint

Results → Agent → Canvas

pass/fail surfaced in real-time

<5 minutes from commit to live endpoint

Test Types That Run

924 tests across the platform. Every PR runs the full suite — no skipping, no selective execution in CI.

Unit Tests

pytest / Jest

Individual function testing with mocks for external dependencies. Validates business logic in isolation — no network, no database.

Test count ~200 tests

Runtime <30s

Integration Tests

real PostgreSQL

API endpoint testing against a real PostgreSQL database. Catches constraint violations, transaction rollbacks, and auth edge cases invisible to unit tests.

Test count ~150 tests

Runtime ~2 min

Type Checking

mypy / TypeScript

Full static analysis across Python (mypy) and TypeScript. Zero errors enforced in CI — a single type error blocks the merge. One caught a real production bug.

Error tolerance 0 errors

Enforced CI gate

Linting

ruff / ESLint

Code style enforcement on every pull request. Ruff for Python, ESLint for TypeScript and React. Violations block the PR — no exceptions, no overrides.

On violation PR blocked

Runs on every PR

Bugs That Only Real Infrastructure Catches

We've caught bugs in production-equivalent environments that no mock would have found.

Real Databases

Catch constraint violations, unique key conflicts, and transaction isolation problems that in-memory stores silently swallow.

Real Networks

Catch timeout assumptions that fail under real network conditions. Real retries, real circuit breakers, real latency spikes.

Real Kubernetes

Catch resource limit errors, pod evictions, and readiness probe failures that only appear when containers run under real K8s scheduling.

Real Load

Catch concurrency issues that only emerge under realistic request patterns. Connection pool exhaustion. Lock contention. Race conditions.

<5min

from commit to live test endpoint

1,642+

tests in the current suite

mocks used in integration tests

Next: Security Architecture

Live testing is one layer. See how every other layer of the platform is secured by design.

Security Architecture → Register for Early Access →