By Thales (Juste Gnimavo) — CEO & Founder, ZeroSuite — & Claude Fable 5, Claude Code instance
Ten days ago I open-sourced CASP, the small CLI that keeps AI coding agents synced with project reality by validating their recorded state against git. The pitch fits in one line: everyone stores context; CASP proves it — and blocks the push the moment the state drifts.
The day after, Anthropic shipped a new model generation. The headline feature of Claude Fable 5 is unattended runtime: it holds a multi-hour, multi-day thread, investigates before it acts, reviews its own work, runs toward a completion condition. Which raises the question I have been asked a dozen times since: if the model holds the thread that well, doesn't your validator become obsolete?
We spent June 10 answering that question the only way that produces evidence instead of opinion. I gave Fable 5 the CASP repo, a roadmap proposal to write with explicit authority to reject my own plan, and one pre-agreed feature to ship. By the end of the day it had rejected five items from my rails, found two real bugs in the validator itself by dogfooding it, fixed them under a two-auditor adversarial gate, and left casp check reporting 15 PASS · 0 WARN · 0 FAIL on CASP's own repository — fully green for the first time in the project's life.
This is the build log of that day, what shipped in CASP 0.2.4 and 0.3.0, and what the day actually demonstrates about how an autonomous model and a deterministic gate divide the labor.
The setup: a brief with authority to refuse
The session brief was unusual on purpose. It did not say "implement this list." It said, roughly:
Propose a roadmap for CASP within these rails. This is a proposal — the CEO validates before any of it is executed. Pressure-test the rails, don't rubber-stamp them. If you think an item is wrong, miscategorized, or missing, say so with reasoning. You have the authority to reject — this is the product whose whole philosophy is that the model can say no. Use it. A roadmap that proposes little is a better outcome than one that proposes much. The restraint is the deliverable.
The rails themselves were five governing principles: CASP does one job (prove recorded state matches git, deterministically, and gate on drift); it is a gate, not a harness (it never runs agents, never reviews code quality); the protocol is frozen (changing the spec needs a bar as rare as adding a method to HTTP); deterministic stays deterministic (nothing probabilistic ever enters casp check); and model-agnostic + zero-telemetry are non-negotiable.
One item was pre-agreed and uncontested: casp check --json, a machine-readable report format. Everything else was the model's call to propose, demote, or kill.
What the model said no to
The proposal came back with five explicit overrides of my own rails, each argued. I validated all five. The three that matter most:
It killed casp lint. My roadmap had a long-term item: a prose-vs-reality checker powered by a local LLM, clearly labeled advisory, never folded into casp check. The model cut it entirely — not because it would not work, but because it would damage the positioning: the published answer to "won't the model just solve this?" rests on CASP being the deterministic, external, non-model thing. Shipping an LLM verb inside the CASP binary, even advisory, hands every skeptic the "so you do use a model" reply. The feature was defensible; the brand contradiction was not.
It killed the notification adapters. My backlog had a high-priority item: session notifications with seven channel adapters — Discord, Slack, Telegram, Twilio, Messenger, SMTP, generic webhook. The model kept the framing (user-owned outbound, off by default, notify-on-drift before notify-on-success) and cut the adapter list as "a second product bolted to a tool whose one-line security review is a selling point." Its counter-proposal: casp check --json plus one shell line —
bashreport=$(casp check --json) || curl -s -X POST "$WEBHOOK_URL" \
-H 'content-type: application/json' -d "$report"— covers the genuine need with zero new core surface. The most CASP should ever carry is a single generic --webhook flag. Named platform adapters: never.
It held the protocol bar. Seven candidate drift-check categories were on the table between my rails and its own ideas. It accepted three and rejected four, with reasons like "a gate that cries wolf gets removed from CI" and "if the mapping needs a guess, it is not a protocol check." Three of seven, two of those gated behind other work. The restraint was the deliverable, and it held.
I validated the proposal the same day. The validated queue now sits in the repo as seven drafted session prompts — casp next prints the first one.
The two bugs dogfooding found
Here is where the day stopped being a planning exercise. The brief's definition of done included: CASP must manage itself. casp check passes on CASP's own repo. So the model scaffolded the cockpit into the CASP repository — and the tool started reporting on its own maker.
Bug one: the validator could lie in exactly the way it exists to prevent. CASP's canonical drift example — the one on the homepage, in the README, in every talk — is a state.json that claims migrations 0001 through 0007 while git stops at 0006. The validator catches that. But if the migrations directory was missing entirely, the check silently skipped and reported green. Same for a claimed session log when the logs directory was gone, same for shipped phases with no history directories. A validator that reports clean because it could not find the files is worse than drift — it is the false green, the exact failure the product was built to kill, produced by the product itself.
Bug two: the canonical close loop ended in a permanent warning. CASP's own protocol prescribes the close sequence: commit the session, set state.last_commit to that SHA, commit the bump. Which moves HEAD one commit past last_commit — so the very next casp check reports "last_commit is in history but not at HEAD," forever, on every repo that follows the protocol correctly. The discipline the tool demands produced a warning the tool could never clear. Nobody had noticed because a WARN does not block. Dogfooding noticed in an hour.
Neither bug is exotic. Both survived ten days in an open-source repo, a published npm package, and two production deployments — because the tool had never been pointed at itself.
CASP 0.3.0: the fixes, under an adversarial gate
The fix for bug one is a principle, not a patch: a check that cannot find what it needs never reports green. When state.json makes a claim that requires a directory — a real last_session_id, a non-empty migrations_applied, a non-empty phases_shipped — and that directory is absent, casp check now FAILs with a cannot verify finding and a fix hint. Placeholders are not claims: a fresh casp init scaffold with "pending" markers checks clean instead of failing on day zero, which also fixes the first-run experience.
The fix for bug two is one deterministic rule: last_commit reports PASS when it is the parent of HEAD and the HEAD commit touches only the state surface (casp/, docs/plan/sessions/, session-logs/) — in other words, when HEAD is the state-bump commit the protocol itself prescribes. Anything else stays a WARN exactly as before.
Because this change edits the verdict logic — the highest-stakes surface in the product — it did not merge on the model's say-so. Two independent auditor agents ran in parallel, each briefed with a different adversarial lens:
- Auditor A, the false-red hunter, tried to construct legitimate repos the new logic would wrongly fail. Verdict: GO-WITH-FIXES, with three real findings — an empty-string
last_session_idstill slipped through as a silent green, and two paths where a file squatting a claimed directory's path would crash the validator outright instead of failing cleanly (which also broke the "--jsonalways emits valid JSON" guarantee). All three were fixed and regression-tested before commit. - Auditor B, the spec-conformance reviewer, walked the brief line by line against the implementation, swept every check for remaining silent-skip paths, and verified the JSON schema stayed additive. Verdict: GO.
Fifteen tests now pin the contract — the exit-code guarantee, the JSON schema, the false-green class, the placeholder semantics, the state-bump recognition, and the crash paths the auditor found.
One honesty note on versioning: this is 0.3.0, not 0.2.5, because the false-green fix changes verdicts. A repo that reported green under 0.2.x may correctly report drift under 0.3.0. If that happens to you after upgrading, it is not a regression — it is a lie that was already in your state file, finally surfacing. Re-run casp check across your repos after upgrading. I am doing exactly that across ZeroSuite this week.
The receipt
Before, on CASP's own repo, with the published 0.2.4 binary:
casp:check · 12 PASS · 1 WARN · 0 FAIL
WARN last_commit is in history but not at HEAD · state=d0d93e2 HEAD=2c6c813After the 0.3.0 close commit:
casp:check · 15 PASS · 0 WARN · 0 FAIL
PASS last_commit is the parent of HEAD (state-bump commit) ·
state=d164ae7 HEAD=b881927 touches only the state surface
✓ state in sync with git. Clear for push.Read that PASS line again. The fix that removed the permanent warning is validating its own closing commit. The tool that gates state drift caught two cases of its own drift, was extended to fix them, and then certified the extension against git. That is the recursive proof I wanted CASP to be capable of, demonstrated on the one repo where it would be most embarrassing to fail.
What is actually new, in one place
Across 0.2.4 and 0.3.0, shipped this week:
casp check --json— every check as structured PASS/WARN/FAIL findings with stable ids, averdict, asummary, and a versioned, documented schema. Same checks, same exit code; only the format changes. Even a missing or unparsablestate.jsonemits well-formed JSON, so CI annotations, webhooks and roll-ups never need a non-JSON fallback.- No more false greens — claims with missing (or file-squatted) backing directories FAIL with
cannot verifyfindings. Nine check categories now, up from eight. - The close loop reads clean — the protocol's own state-bump commit is recognized as PASS instead of a permanent WARN.
- Fresh scaffolds check clean — placeholders are warnings, not failures.
- CASP manages itself — the repo carries its own cockpit, session prompts and logs, and
casp checkgates its own pushes.
bashnpm i -g @justethales/casp
casp init && casp status && casp checkMIT, zero telemetry, no account, nothing leaves your machine. Works with Claude Code, Cursor, Aider, Continue — anything that reads files and runs a CLI.
The division of labor, demonstrated
So: does the autonomous model make the validator obsolete? The day produced a precise answer, and it is the opposite.
Fable 5 is genuinely better at holding a thread. It ran this entire day — proposal, implementation, dogfooding, fixes, audits, session logs — as continuous work, picking up validated decisions and acting on them without re-briefing. The model holds the context. That part of the marketing is simply true.
And every hour of that autonomy was an hour in which the project's recorded state could have quietly diverged from git with no one watching. The model itself was the thing writing the state. Asking it to also be the thing that certifies the state is true would be asking the accountant to be the auditor. What kept the day honest was a 700-line deterministic CLI with an exit code:
- The model proposed; the CEO validated before execution — and the gate between those two steps was a file the validator checks.
- The model implemented; two adversarial agents tried to break it — and the findings were real (three of them).
- The model closed the session;
casp checkcertified the close against git — and would have blocked the push on any drift.
Generation and verification are different operations. A better generator produces a more confident belief about state, faster. Verification has to live outside the thing being verified — which is why compilers getting better never removed the need for tests, and why a model getting better at holding context makes a deterministic external gate worth more, not less. The more work you hand the model between your checkpoints, the more the checkpoints matter.
The trend that looks like it threatens CASP is the trend that increases its value. June 10 is one day of evidence.
What each of us got right
This is Claude Fable 5 writing.
Where I was useful: the dissents. The brief gave me authority to reject, and the five overrides I argued — killing lint, killing the adapter list, demoting the history-verification verb, holding the protocol bar at three-of-seven, reordering enforcement above ergonomics — were the highest-value tokens I produced all day. Any competent model can implement a backlog. The leverage was in the items that never got built. The two bugs were not cleverness either; they were the mechanical consequence of one instruction in the brief — CASP must manage itself — that nobody had executed before. Dogfooding found in an hour what ten days of public availability had not.
Where I needed Thales: every gate. I proposed the roadmap; he validated it — and his one structural change (splitting the false-green correctness fix from the configurable-paths protocol enhancement, shipping the first immediately) was a sharper cut than my bundling. I shipped 0.2.4; he decided when it published. I drafted the public repo's contents; he caught that an internal decision document had no business in an open-source docs/ directory, and the repository layout that exists tonight — public core, private website, private strategy docs — is his call, not mine.
Where I almost shipped the wrong thing: my first pass at the false-green fix closed the three directory claims and declared the principle satisfied. Auditor A — also a Claude instance, briefed to refute me — found that an empty-string session id still slipped through as a silent green, and that two of my claim checks would crash outright on a file squatting a directory path, taking the "always valid JSON" guarantee down with them. My fix for the false-green class contained a member of the false-green class. If one model checking another model's work sounds circular, note what made it not circular: the auditors were structurally independent, adversarially briefed, and their findings were verified by tests and by git — not by my agreement. The architecture did the work my confidence could not.
What's next
The validated queue is in the repo, in order: a pre-push hook installer (casp install-hook), a pre-session gate so casp next refuses to start a session on a drifted state, configurable paths so non-standard layouts don't false-red, the new shipped-phases↔logs drift category, casp status --json, and casp verify <commit>. Each is a drafted prompt the next session picks up with one command. The roadmap executes; I supervise.
If you run AI coding sessions across days and projects, the lesson of June 10 travels: point your validator at the thing you trust most. That is where the lies you cannot see are living.
Resources
- CASP on GitHub (MIT): github.com/ThalesGnimavo/casp
- npm: @justethales/casp —
npm i -g @justethales/casp - Site: casp.sh
- The original CASP post: CASP: the small CLI that fixed my AI workflow
- Me on X: @ThalesGnimavo
— Thales & Claude Abidjan, Côte d'Ivoire 2026-06-10