How Mejepa works
Five instruments. One signed verdict.
Every AI-written Python patch passes through five stages. The first four narrow the patch down to a calibrated verdict; the fifth makes the verdict portable. Each stage is anchored to the FSV plan with the citation in the margin.
Stage 1 · Chunking
The patch is decomposed into chunks.
Mejepa reads the patch and the base file as bytes-on-disk. The same chunk boundaries are used at training time and at inference time, so the predictor never sees a representation the frozen instruments haven't already scored. Chunks carry a SHAKE-256 content hash that propagates into the witness chain at stage 5.
Stage 2 · 15-slot panel
An array of per-embedder vectors, not a flat concat.
The panel is a fixed 15-slot array where each slot is a distinct per-embedder vector. Slot identity is preserved end-to-end. The 13 frozen frozen instruments author the panel; 15 = total slots including derived/composed slots. Two of the slots are reserved for cross-panel triangulation (issue #405, blocked-P0).
Stage 3 · Conformal predictor
Three binary predicates, one conformal split.
Mejepa's predictor evaluates three binary predicates per patch:
- Q1 · claim_exists — does the patch make a recognizable claim about behavior?
- Q2 · oracle_passes — would the Docker oracle accept the patch?
- Q5 · predicted_shift_event_occurred — will the patch trigger a downstream test/state shift the agent did not predict?
The split-conformal head emits Pass, Fail, or Abstain. OOD patches are flagged with named reasons. Cold-cell patches get the same treatment. Q4 (perf / cost / reasoning class) was formally retired as wontfix-ambiguity-boundary — Mejepa does not predict subjective surfaces.
Stage 4 · Docker oracle
Ground truth is the Docker container, not a model.
For every patch, Mejepa runs python -m swebench.harness.run_evaluation in a sealed Docker container. The per-instance report.json is parsed into an OracleVerdict. The verdict carried in the signed packet always includes the oracle report.json SHA-256, so any auditor can re-run the same container and confirm the result independently. This is the falsifiable end of the system.
Stage 5 · Witness chain
ed25519 + SHAKE-256, replay offline in ~30s.
Every verdict (and every panel state, every training certificate, every feedback event) is signed with ed25519 and appended to a SHAKE-256-linked chain. The public key is published. The chain is verifiable offline with no Mejepa server reachable — an auditor, an underwriter, or opposing counsel can confirm the verdict in roughly 30 seconds.
The MCP surface · 57 mejepa_* tools
Any MCP-aware agent harness — Claude Code, Cursor, Windsurf, Cline, Continue, Replit — can call these tools directly. The capture infrastructure is multi-language (pytest, cargo-test, unittest, jest, vitest); the predictor calibration is Python-only today.
| Tool group | Status | FSV ref |
|---|---|---|
Mistake-driven loopmejepa_record_mistake · mejepa_mistake_history · mejepa_mistake_loop_status |
SHIPPED 2026-05-20 | §1.5 · 04 §3.2 |
Skill↔code linkagemejepa_skill_to_code · mejepa_code_to_skill · mejepa_skill_set_query · mejepa_skill_coverage_audit |
SHIPPED 2026-05-19 | §1.6 · 04 §3.3 |
Live capture / observemejepa_observe_shift · mejepa_record_agent_feedback · mejepa_pause_predictions · mejepa_subscriber_status · mejepa_capture_audit |
SHIPPED | §1.6 · 04 §3.5 |
Heal / operator overridemejepa_heal_status · mejepa_daemon_status · mejepa_operator_override_prediction · mejepa_promote_approval · mejepa_rollback_to |
SHIPPED | §1.6 · 04 §3.6 |
Eval / ship-gatemejepa_eval_run · mejepa_eval_build_graph · mejepa_ship_gate_status · mejepa_weekly_eval_dashboard · mejepa_compression_progress · mejepa_bootstrap_status |
SHIPPED | §1.6 · 04 §3.8 |
Cross-panelmejepa_cross_panel_score · mejepa_cross_panel_dashboard |
PLANNED — #405 BLOCKED P0 | §1.6 · 04 §9.2 |
Failure-mode wrappersmejepa_list_failure_modes · mejepa_label_failure_cluster · … |
PLANNED — #417 remaining | §1.6 · 04 §9.1 |
Want to wire Mejepa into your agent loop today? Run the in-browser demo or request a pilot.