The Oracle for AI-written Python · est. 2026

Reality-checked AI code. Signed proof, every patch.

Your agent just shipped a patch. The Oracle sees what is actually in it — every byte that moved, every test outcome the Docker container returns, every prior exemplar it resembles. You get to know what is happening inside your own codebase, in the present, before you decide what ships next. Thirteen instruments to see with. The signed verdict is the receipt that you knew.

An oxblood wax seal pressed onto a Python diff. Copperplate caption: VERDICT: PASS · 0.866667.

The burning problem

You don't know what is actually in your code.

The agent shipped. The tests are green. And you have no way of seeing what really changed — what bytes moved, what state mutated, which past failure mode this resembles. The not-knowing has become the default; most teams have accepted it as the cost of moving fast. The Oracle lets you see the present. Once you see, you can shape what ships next — and prove to anyone downstream (auditor, insurer, board, customer) that you knew, in the moment, what was real.

Top-down view of a walnut notary desk. Five hands reach in holding a SOC 2 page, a cyber-insurance rider, an RFP, a release checklist, and a Rule 11 evidence sheet around a fanned-out Python diff sealed with oxblood wax.

How the Oracle works · three movements

01 · SEE

The Oracle reads the patch through 491 prior exemplars.

Closest-historical-exemplar lookup over an accepted label registry. On the Fail-row holdout, exemplar agreement = 1.0 across 113/114 rows. You don't get "the AI is uncertain." You get a named failure mode and a precedent it resembles.

02 · KNOW

The Docker oracle returns ground truth.

Per-instance python -m swebench.harness.run_evaluation in a sealed container. The report.json is the present, made legible. Not a model judging itself — the world judging the patch.

03 · SHAPE

Sign what you knew with ed25519 + SHAKE-256.

Every verdict appends to a chain only the public key can read. Anyone — your auditor, your insurer, your future self — can replay offline in ~30 seconds and confirm what you saw. You decide what ships next. The chain is the proof you decided knowingly.

Why the Oracle can see

The Oracle is outside the agent

The predictor is not an LLM, not a fine-tune, not a wrapper. The voice that wrote the patch is not the voice that reads it. Two circuits — one to speak, one to see.

Thirteen instruments to see with

Externally-defined, deterministic, immutable. AST diffs, data-flow, type graphs, Docker outcomes, chain hashes. The instruments read bytes; bytes do not lie.

Ground truth, not opinion

SWE-bench Lite, 300 × 8 mutation cells = 2,400 Python instances. The Docker container runs. The world returns its verdict. Nothing in between is interpreting anything.

Anyone can replay what you knew

ed25519 + SHAKE-256. The chain is portable, offline-verifiable, forever. You do not have to trust Mejepa to confirm what Mejepa saw — the public key is enough.

The ship-gate · live

Panel A: 0.866667target 0.95

We measure the same number every week on the same 8-of-30 holdout from the SWE-bench Lite 300 × 8 corpus. Until Panel A holds at 0.95 stable across 4 rolling windows — and Panel B (cross-panel, non-overlapping encoders) clears its first measurement — Mejepa stays pre-production. There is no "soon." The gate fires when the number fires.

A brass-rimmed analog dial. Pointer at 0.866667 on a 0.85-to-0.95 arc. Brass plaque: SHIP-GATE · PANEL A · 8 OF 30 HOLDOUT · 2026-05-20.
A half-open cream envelope with a broken oxblood wax seal. A copperplate verdict slip inside.

See what the Oracle sees. In your browser. No install.

Pick a real SWE-bench Lite scenario. The Oracle returns the same shape the production predictor returns — Pass / Fail / Abstain, named failure mode, closest exemplar, signed packet. Copy the packet out and replay it offline with the public key. The present is something you can actually read.

Consult the Oracle →