AI Code Done-Claim Audit — Mejepa · Verify What Your AI Agent Actually Did

§1 THE DONE-CLAIM PROBLEM

"Done" should mean something.

Coding agents are productive at a level that demolishes per-line code review. They are also confidently wrong at a level that demolishes per-line code review. The current default is to trust the green checkmark — and discover the lie at production.

Coding agents now ship features, fix bugs, refactor modules, and write tests. They also routinely claim work they did not do: tests written but not run, edge cases acknowledged but not handled, refactors that quietly change behavior, files modified that were never named in the request.

A senior reviewer used to be the verification layer. Senior reviewers do not scale to ten merged PRs per agent per day across thirty repos. The team's choices are now: (a) trust the agent, (b) re-review everything (negating the speed gain), or (c) get an independent verification record for the PRs that matter — the ones touching production, security, billing, or compliance code.

Mejepa is option (c). Not another linter, not another CI step, not a model-as-judge that hand-waves "looks good." A signed record, produced by the same frozen-instrument system Mejepa runs in regulated industries — applied to one PR or one agent session at a time.

You don't have to deploy anything. You submit the PR. We return the record.

¶ CLAIMS MEJEPA CHECKS

CLAIM TYPE · I
"I added tests for X" → do the tests exist? do they actually exercise X?
CLAIM TYPE · II
"Refactor without behavior change" → does the diff preserve semantics on the test set?
CLAIM TYPE · III
"Fixed the bug" → does the failing test now pass? does any previously-passing test now fail?
CLAIM TYPE · IV
"No security implications" → did the diff touch authn / authz / crypto / secret-handling code paths?
CLAIM TYPE · V
"Stayed in scope" → did the agent edit files outside the requested change envelope?
CLAIM TYPE · VI · PROVENANCE
Which lines did the agent write, which did the human edit after, which model + prompt produced each change — line-level, signed, replayable.

The signed record is what makes the verdict reviewable six months later, when the regression appears in production and you need to know what the original PR did and did not check.

§2 THE PROCESS · RFC-STYLE

Four steps. No deployment. No infra.

Submit a public PR URL or upload a private patchset and agent transcript. Mejepa returns a packet you can attach to the PR, commit to the repo, or hand to your compliance team. That is the entire workflow.

01 / SUBMIT

One PR or session

Public GitHub/GitLab/Bitbucket URL, or an encrypted patchset upload. Optional: include the agent transcript (Claude Code, Cursor, Aider, etc.) for richer claim extraction.

02 / READ

13 frozen instruments

The diff, the test results, the file structure, and the agent's stated claims flow through Mejepa's deterministic instrument panel — the same panel that runs in the regulated production tier.

03 / VERIFY

Per-claim verdict

Each claim is classified Pass, Fail, Abstain, or Out-of-distribution. Edge cases route to a human reviewer (Counsel-tier audits add this by default).

04 / SIGN

Append to chain

Signed packet returned: PDF cover, machine-verifiable JSON, hash chain link. Drop it on the PR. Verify it offline with our public key whenever you need to.

Human Oversight Rule

Express audits are AI + frozen-instrument. Counsel audits (the $2,500 tier) add a human reviewer with a published engineering bar — the reviewer's identity is part of the witness chain. You always know who signed.

§3 A REAL RECORD

What you attach to the PR.

No screenshots, no dashboards. The deliverable is a signed JSON + a one-page PDF cover. Both verifiable offline.

AUDIT RECORD · github.com/acme/api/pull/2418 · TIER · EXPRESS REPLAY · mejepa verify < record.json

{
  "record":     "mejepa-code-acme-2418",
  "tier":       "express",
  "issued":     "2026-05-10T23:01:14Z",
  "subject":    "github.com/acme/api/pull/2418",
  "diff":       { "files": 4, "added": 312, "removed": 185 },

  "agent": {
    "tool":          "claude-code",
    "session":       "0x4a9...e21",
    "model":         "claude-opus-4-7",
    "prompt_hash":   "sha256:7e2...c01",
    "began":         "2026-05-10T19:42:08Z"
  },

  "provenance": {
    "lines_total":             497,
    "lines_agent_authored":    441,
    "lines_human_edited_after": 56,
    "lines_unattributable":     0,
    "license_scan":            "spdx:Apache-2.0",
    "sbom_delta":              "clean",
    "per_file_attribution":    "see appendix/provenance.csv"
  },

  "claims": [
    { "id": "C1", "text": "Added tests for edge case X",
      "verdict": "FAIL", "reason": "No test references X; coverage flat" },
    { "id": "C2", "text": "Refactored without behavior change",
      "verdict": "ABSTAIN", "reason": "Two callers' semantics changed under load; reviewer needed" },
    { "id": "C3", "text": "Fixed null-pointer at handler.go:88",
      "verdict": "PASS", "reason": "Failing test now green; no regressions in suite" },
    { "id": "C4", "text": "No security implications",
      "verdict": "PASS", "reason": "Diff did not touch authn/authz/crypto paths" },
    { "id": "C5", "text": "Stayed within /pkg/handlers",
      "verdict": "FAIL", "reason": "Edited /pkg/internal/config — outside stated envelope" },
    { "id": "C6", "text": "Provenance — agent-vs-human attribution intact",
      "verdict": "PASS", "reason": "All 497 lines attributable; SBOM delta clean" }
  ],

  "summary": { "pass": 3, "fail": 2, "abstain": 1 },
  "coverage": 0.91,

  "instruments":  13,
  "chain_prev":   "0x4c08...2a1e",
  "chain_next":   "0x7e21...b9f0",
  "ed25519_sig":  "0x... (264 bytes)"
}

Verify any Mejepa record offline: curl mejepa.com/keys | mejepa verify < record.json

§3.1 CODE PROVENANCE

Every line. Who wrote it. When. With which model.

The Mejepa packet doesn't just verify the claim — it attributes every line in the diff. Agent-authored, human-edited-after, or pre-existing. Tied to the agent session, model version, prompt hash, and timestamp. Line-level. Signed. Replayable six months later.

USE CASE · I — Audit & compliance

SOC 2, SOX, FedRAMP, PCI: assessors increasingly ask "which commits were AI-generated, and how were they reviewed?" The provenance metadata is the audit-trail entry that answers without you reconstructing it from chat logs.

USE CASE · II — Open-source license compliance

SPDX, REUSE, SBOM: the packet flags AI-generated lines so your SBOM and license-scan tooling can treat them under whatever policy your legal team has set — including outright exclusion from copyleft-licensed projects.

USE CASE · III — E&O and tech-errors insurance

Carrier defensibility: when an AI-generated change causes an incident, the signed provenance record establishes the chain of custody — what the agent produced, what the human accepted, when the abstention was raised, and who approved the merge anyway.

USE CASE · IV — Patent & prior-art

Inventorship defensibility: the ed25519-timestamped record establishes when code was written and by what agent — a defensible artifact in patent-prior-art disputes and inventorship challenges around AI-assisted innovation.

Provenance is included in every audit at no extra cost. The same instruments that verify the claim also attribute the lines. The witness chain doesn't care which job it's doing.

§4 PRICING

One PR. From 2 hours up.

Express handles the high-volume case. Counsel-Reviewed adds a human reviewer for the PR you would lose sleep over.

Express Audit

$500 / PR · flat

Mejepa's frozen-instrument panel + per-claim verdict + signed record. Starting at 2 hours, longer for larger PRs. Built for high-volume agent workflows where you want a record on every consequential PR.

From 2 hours · scaled by size Signed · ED25519

Counsel-Reviewed Audit

$2,500 / PR · flat

Express verdict plus human reviewer (engineering bar published) inspecting abstention queue and any failed claims. Reviewer signs as part of the witness chain. Recommended for PRs touching auth, billing, crypto, customer data, or migrations.

From 3 hours · scaled by size Signed · Reviewer-Witnessed

Volume pricing for dev agencies and AI coding shops · 100+ audits/month → custom rate · annual prepay available

§5 WHO USES THIS

Built for teams using agents at scale.

If you have one agent generating one PR a week, Mejepa is overkill. If you have ten engineers each running an agent through 30 PRs a sprint, the per-PR record is how the team retains review discipline.

SEGMENT / 01

Dev agencies + AI coding shops

You ship client code generated by agents. The audit record protects you when "but the AI said it worked" is the client's first question post-incident.

SEGMENT / 02

Software teams using coding agents

You merged Claude Code, Cursor, Aider, or Devin into the workflow. Senior reviewers can't audit every PR. Mejepa audits the ones touching production-critical paths.

SEGMENT / 03

Regulated codebases

HIPAA, SOC 2, PCI, SOX, FedRAMP scope. The signed record is the audit-trail entry assessors will increasingly ask for as AI-assisted commits proliferate.

Provenance · Built on Teleox.ai

Mejepa is the commercial productization of Teleox.ai — an independent research framework on meaning compression. The 13-instrument panel and witness chain that power this audit are open-research primitives (DDA + TCT), shipping in production for code since Phase 1. Reproducible. Citable. Not a black box.

§6 WHAT WE PROMISE · WHAT WE'RE BUILDING

What ships today. What's on the roadmap.

Engineers will sniff out a feature list dressed as a roadmap. Today is what runs in production right now. Tomorrow names the gated work — Phase B and beyond — that's earned the right to be hyped.

Today · The promise

Line-level attribution — every prediction carries line_range: (u32, u32) + contributing_embedders. The verdict is grounded in named bytes.
Named ReasoningClass::Hallucination — first-class enum variant on every patch (Correct / MostlyCorrect / PlausibleButWrong / Hallucination / Hedging / Overclaiming / UnderClaiming / Mute). Most tools won't name the failure mode. We do.
Full-State Verification (FSV) — every claim has a corresponding RocksDB row, file SHA-256, and a separate verifier code path. The verifier never shares code with the writer.
Replay reproducibility — full chain replay runs every 10,000 attempts. The packet you receive can be regenerated bit-for-bit from inputs alone. Single-GPU on commodity hardware.
Published research — Zenodo DOI 10.5281/zenodo.19977981. The methodology is peer-reviewable.

Tomorrow · The vision

Counterfactual minimum edit — mejepa_explain_prediction returns the smallest edit that flips the verdict to Pass. Phase B; partial pipeline available to design partners today.
Pre-commit simulation — mejepa_predict_what_if: hypothetical verdict before the patch lands.
Patch-similarity graph across customers — anonymized: "your bug looks like 47 others — here's the recurring fix pattern."
ME-JEPA-Voice / Image / Robotics / Math — same external-frozen-instrument architecture, new domains. Domain packs ship as TOML, not new binaries.

Phase B / future-domain items are not in the audit-packet contract. They're the destination, not the deliverable.

§7 FREQUENTLY ASKED

Engineers ask. Mejepa answers.

What is an AI code done-claim audit?

A productized engagement that verifies what an AI coding agent actually did versus what it claimed. Submit one PR, repo change, or agent session → receive a signed record: does the claimed change exist, do the tests prove it, where edge cases hide, line-level attribution (agent vs human), SBOM delta + license scan, and per-claim verdict (Pass / Fail / Abstain / OOD). From 2 hours, scaled by PR size. $500 Express or $2,500 Counsel-Reviewed.

How do you verify what an AI coding agent actually did?

Mejepa parses the agent's natural-language claims via AgentClaimGraph (deterministic, no LLM), runs the diff through 13 frozen instruments and the conformal Gτ guard, executes the test suite to confirm test claims, scans the codebase for license + SBOM deltas, and matches every prediction to chunk_id + line_range + contributing_embedders. Signed with ed25519, committed to an append-only witness chain. Verifiable offline via mejepa.com/keys.

What is line-level provenance for AI-generated code?

Attribution of every line in the diff to its origin: agent-authored, human-edited-after, or unattributable. Each prediction carries line_range:(u32, u32) + contributing_embedders, plus agent metadata (tool, session, model_version, prompt_hash, started_at). Consumed by SOC 2 / SOX / FedRAMP / PCI auditors, E&O carriers, SPDX/REUSE/SBOM toolchains, and patent counsel.

How fast can Mejepa audit an AI-generated PR?

Express ($500) starts at 2 hours for small PRs and scales upward with size (LoC, file count, test surface area). Counsel-Reviewed ($2,500) starts at 3 hours with the same scaling, plus a human reviewer with published engineering bar; reviewer signs as part of the witness chain. The SLA scales honestly with the work — no fixed cap, no false promises.

How much does an AI code audit cost?

Express Audit — $500 flat per audit (from 2 hours, scaled by PR size). Counsel-Reviewed Audit — $2,500 flat per audit (from 3 hours, scaled, with human reviewer signature in witness chain). Volume pricing for dev agencies and AI coding shops running 100+ audits/month with annual prepay. Counsel-Reviewed recommended for PRs touching authn / authz / crypto / customer-data / migration code.

How does Mejepa detect AI hallucinations in code?

Mejepa carries ReasoningClass as a first-class enum on every patch: Correct, MostlyCorrect, PlausibleButWrong, Hallucination, Hedging, Overclaiming, UnderClaiming, Mute. The Hallucination class flags claims that are plausible but unsupported by the diff, the test suite, or the codebase. Grounded in the 13 frozen instruments and bounded by the conformal Gτ guard. Most competitors do not name the failure mode. Mejepa does.

Does Mejepa work with Claude Code, Cursor, Aider, and Devin?

Yes — coding-agent-agnostic. Submit a public GitHub/GitLab/Bitbucket PR URL or upload an encrypted patchset + optional agent transcript. Mejepa supports transcripts from Claude Code, Cursor, Aider, Devin, and other agentic coding tools. Agent metadata (tool, session_id, model_version, prompt_hash) becomes part of the witness chain. The audit verifies claims regardless of which model produced them.

What is Full-State Verification (FSV)?

FSV is Mejepa's central methodological contribution, published in the Dynamic / ME-JEPA research paper (Zenodo DOI 10.5281/zenodo.19977981). Every claim has a corresponding RocksDB row, file SHA-256, and decoded readback through a separate code path. The verifier never shares code with the writer — eliminating self-confirming bugs by construction. Full replay reproducibility runs every 10,000 attempts; packet is regenerable bit-for-bit from inputs alone on a single commodity GPU.

Your agent says "done."
Mejepa checks reality.