Back to thales
thales

Cockpit: the small CLI that fixed my AI workflow

Five months ago I shared the workflow that built six production products with zero engineers. Here is the one piece I missed and the small open-source CLI that fixes it. MIT-licensed, source on GitHub today, npm package live: npx @justethales/cockpit init.

Juste A. Gnimavo (Thales) | May 30, 2026 17 min thales
EN/ FR/ ES
workflowai-ctoclaudemethodologycockpitsession-architectureopen-sourceafrica-tech

By Thales (Juste Gnimavo) — CEO & Founder, ZeroSuite, Inc.


In March, I published a long article explaining the workflow I use to run six production products with Claude as my CTO and zero human engineers. It was the most-read post I have ever written. It described five pillars: the CLAUDE.md constitution, the session architecture, phase-based development, the multi-agent audit loop, and the authority structure that lets Claude say no to me.

I stand by that article. Those five pillars built sh0, FLIN, Déblo, 0fee, 0cron, and 0diff. They still ship working code today.

But the article had a hole I did not see at the time. It described how to run a single session well. It said nothing about how to keep dozens of sessions coherent over months of work.

That is a separate problem. It is also the problem that nearly broke a session last Friday afternoon.


The bug that almost wasted a session

The setup. I was deep into Phase 4.6 of Conductor, our internal AI workspace. The day's session was Section C of that phase, a conversation API surface. The previous session, two days earlier, had shipped Section B (the database schema). I opened a fresh Claude Code session, typed /next, and watched the agent start executing.

About 90 seconds in, the agent paused. It had read the cockpit (the state file I will explain in a moment) and noticed something I had not.

state.json.next_prompt points at the parent PHASE-4.6-WORKSPACE-V1.md, but the sub-prompt drafted in the most recent commit is PHASE-4.6C-CONVERSATION-API.md and next_phase names Section C. Executing the C sub-prompt is the real next slice.

The cockpit had drifted. The metadata I had updated at the end of session B pointed at the wrong file. The agent caught it by cross-referencing two fields. If it had not, if the agent had blindly executed what next_prompt named, I would have wasted the session redoing parent-prompt scope that was not even queued.

This is the failure mode my five-pillar article never described. Each session in isolation was solid. The handoff between sessions was a manual, error-prone copy-paste that worked 95% of the time and silently broke the other 5%. The cost of that 5% was not a syntax error. It was thirty minutes of confusion, a wrong direction taken, and a quiet erosion of trust in the cockpit itself.

I sat with the problem for an hour. Then I built the fix. I am shipping it tonight as MIT-licensed open source. Anyone reading this can install it in 30 seconds and use it on any project.

It is called Cockpit. Below is what it is, why it works, and why I think every developer running AI-driven sessions needs something like it.


What a cockpit actually is

I have been using the word "cockpit" informally in our repos for about three months. The idea is small: a directory at the root of every project that answers the question "where am I, what am I doing, what comes next" without making the agent re-read the codebase.

A cockpit holds six markdown files, one JSON file, and a templates folder:

state.json is the machine-readable single source of truth. current_phase, next_phase, next_prompt, last_session_id, last_commit, phases_shipped, migrations_applied. Every field is a fact. Agents read it before doing anything.

now.md is the human-readable focus. One paragraph describing what just shipped, what to do next, what to NOT do. Updated at every session close. Future-me reads this paragraph and reconstructs my working state without reading any code.

roadmap.md is the forward surface: Next-3, in-flight, blocked, queued, shipped-this-week, phase scoreboard. References prompts under docs/plan/sessions/ and logs under session-logs/.

architecture.md numbers the segments of the codebase (R1, A2, T3 and so on) so a session can delegate by ID ("audit T3", "extend A2") instead of by directory.

carte.md is an ASCII overview of how routes, libs and DB hang together. Eyes-on-glass, no scrolling.

README.md is the session-start and session-close protocols, written for the agent that is about to act on this cockpit.

templates/ holds three canonical scaffolds: session-prompt.md, session-log.md, audit-brief.md. Every new session draft starts from there.

Maybe 1,000 lines of markdown across the project's lifetime. Costs about 3 to 5 thousand tokens to load (about 1k via the new status CLI). Costs ninety seconds to update at session close.

That was the shape for three months. It worked.

What it was missing, and what last Friday's bug surfaced, was a machine-verifiable gate. The cockpit's discipline was 100% manual. If I forgot to bump one field, nothing caught me. The agent at session start trusted what state.json said, and state.json said what the previous me had typed, and the previous me had been distracted.

The fix is also small in spirit, if not in line count: a single-file TypeScript validator (about 500 lines today) that compares state.json against the filesystem and against git, exits 1 on any inconsistency, and prints a → fix hint for each finding. Run it before every push. The day you skip it is the day you ship a state file that lies.


What the validator catches

I built it on a Friday afternoon. It catches nine failure modes I have observed in three months of cockpit drift across ZeroSuite:

  1. state.json.next_prompt points at a missing file. The session forgot to draft the next prompt.
  2. state.json.next_prompt points at a prompt with status: shipped. The exact bug that cost me half an hour. The cockpit was never bumped after the previous session.
  3. state.json.last_session_id does not map to a session-log file. The log was never written.
  4. state.json.last_commit not found in git log. Bogus SHA, deleted branch, typo.
  5. state.json.phases_shipped[] has duplicates. The close protocol ran twice.
  6. state.json.migrations_applied[] does not match drizzle/*.sql. A migration shipped without bumping state, or state lists a ghost.
  7. A session prompt has status: shipped but session_log: pending. The flip-on-close was done lazily.
  8. A shipped prompt's session_log: points at a missing file. Typo or rename.
  9. Uncommitted changes in cockpit/, docs/plan/sessions/ or session-logs/. A previous session forgot the commit step.

Each finding prints a → fix line so the next session can resolve it without re-reading the documentation:

cockpit:check · 22 PASS · 1 WARN · 1 FAIL
──────────────────────────────────────────────────────────────────────
  FAIL  next_prompt is already SHIPPED · docs/plan/sessions/PHASE-4.6C-CONVERSATION-API.md has status: shipped — cockpit was not bumped after that session
        → either (a) update state.json.next_prompt to the real next slice, or (b) re-execute the shipped prompt explicitly
  WARN  last_commit is in history but not at HEAD · state=81a5138 HEAD=e8795b1
        → bump state.last_commit to e8795b1 if the new commits are uncockpitted work

✗ 1 drift detected. Fix before push.

That is the whole interaction surface. No dashboard. No SaaS. No login. Pipe it into a pre-push hook, into CI, into a Makefile target, whichever fits your project. Exit code 1 on drift, 0 otherwise.


Field test: two production projects, 90 seconds each

Before I shipped 0.1.0 to npm I ran npx @justethales/cockpit init against two live ZeroSuite projects I had not opened in weeks. Same command, two very different shapes. Both surfaced concrete bugs in under two minutes.

Test 1 — 0seat.dev (SvelteKit + Prisma, a project I had paused six weeks earlier)

The project is a SvelteKit 2 + Prisma 6 + Stripe + Anthropic SDK app, originally extracted from a Solid.js + Bun + InstantDB monorepo in March. Last commit 2eaf2dd was Mar 31. I had not touched it since.

I ran:

bashcd ~/ZeroSuite/0seat.dev
npx @justethales/cockpit@latest init
npx @justethales/cockpit@latest status
npx @justethales/cockpit@latest check

Three findings in 90 seconds, all of which I had forgotten:

  1. REFACTORING-PLAN.md had been sitting uncommitted in the working tree for two months. The plan called for migrating from InstantDB to Prisma. The most recent commit message said "going to start added real instantdb client". The plan and the work disagreed. No file in the project told you which one was current. The cockpit next_prompt field forced me to pick. (I picked Prisma — the plan won.)
  2. Eight untracked files (.auth-keys/, _logs/, 0seat.zip, .nx/, .vscode/, and three more) had been silently rotting. cockpit status surfaced them in a single screen.
  3. The CLAUDE.md was 17 KB of marketing prose ("The Four-Engine AI-Mesh System"), zero engineering rules. The mismatch became obvious as soon as cockpit/now.md asked me to write a one-paragraph focus and I realised I had nowhere to put it.

The check went from 8 PASS · 2 FAIL (fresh scaffold) to 14 PASS · 0 FAIL · 1 WARN after I wired the real last_commit, drafted a real PHASE-1-PRISMA-AUTH-MAGIC-LINK.md prompt, and backfilled a session log for the March work. Total elapsed: under five minutes.

Test 2 — Poponi (Rust + React Native + SvelteKit + Python, no git)

Poponi is the voice-first moto-taxi app for Abidjan. Multi-stack: Rust microservices on the backend, React Native client and pro apps, SvelteKit dashboard, Python voice server (Pipecat + Ultravox), infra. Last file mtime: Apr 15. Six weeks of silence.

I ran the same three commands. Cockpit immediately printed branch (no git) · HEAD (no git) — and I realised the worst possible state. The local project had no .git folder at all, not at the root, not in any of the six subdirectories. The private GitHub repo at github.com/zerosuite-inc/poponi-source-code did exist — I had created it months ago with intent — but it was never initialised locally and nothing had ever been pushed. The remote page was still showing the default git init / git add / git commit / git push quick-setup instructions. Six weeks of Rust microservices, React Native client and pro apps, SvelteKit dashboard, and Python voice server lived exclusively on this laptop. One spilled coffee and the project ended.

Cockpit did not fix that automatically. But the moment its status header said (no git), I noticed something I had been missing for weeks. The remote was created, the local never happened, nothing on either side raised a flag.

I fixed it on the spot. Same session, ten minutes after the discovery: secret audit (.env was already in .gitignore, no hardcoded keys in source, no service-account JSONs in the tree), git init -b main, 151 files staged, single monolith commit cf1b77c titled "initial commit — 6 weeks of multi-stack scaffolding", remote added, git push -u origin main. The repo is no longer a single point of failure. The cost was ten minutes I had been avoiding for weeks because nothing was reminding me to spend them.

The second finding was that the project's own claude.md mandates a work-sessions/ folder with a four-phase audit cycle for every feature. The folder does not exist. The discipline is documented in the constitution but never enforced. Cockpit's session-logs/ shape is the 90%-overlap fix; pointing at it forced the obvious question — am I writing the reports or not?

check --no-git correctly dropped the four git-dependent assertions and ran the metadata ones. The two real failures (no prompt drafted, no log written) were both legitimate. The CLI degrades gracefully when git is missing.

The pattern both tests revealed

Cockpit's value is not in the lines of TypeScript. It is in the questions the scaffold forces you to answer:

  • What is the next concrete slice? → drafts the file or fails.
  • Where is the last session log? → expects it on disk or fails.
  • Which commit are you anchored at? → matches git log or fails.
  • Is the working tree clean enough to push? → warns if not.

In both tests, the questions were the value. The validator was just the part that refused to let me skip them.


The two other pieces

Once the validator existed, I realised it solved a second problem I had not named yet: templates as implicit knowledge.

Every session produces three artifacts: the session prompt (drafted at the end of the previous session), the session log (written at the end of this session), and the audit brief (used to spawn the post-implementation audit agent). All three had a shape (frontmatter format, section order, mandatory blocks such as "Observed divergence") that I had refined over eleven sessions. New sessions discovered the shape by mirroring the most recent prior file. Which meant the most recent prior file's quirks propagated. Which meant the shape drifted.

The cockpit now ships those three scaffolds under cockpit/templates/. Copy them when you draft a new artifact. The shape stays stable because nobody is mirroring anymore. Everyone is copying from the same canonical source.

The third piece is pnpm cockpit:status: a one-screen snapshot of the project state, printable in about 300 ms, that replaces the four or five cat commands I used to run at session start. It prints the package name and version, the git branch and HEAD, the current and next phase, the next prompt's status and a preview, the last 10 commits, the current focus from now.md, and the Next-3 from roadmap.md. Color-aware, TTY-aware, plays nicely in a Claude Code session or a regular terminal.

That is the entire CLI surface today: cockpit init (scaffold), cockpit status (read), cockpit check (validate), cockpit new prompt|log (template). Four verbs. Each does one thing. None does anything magical.


Why I am open-sourcing it

Three reasons.

First, it is small enough to read. The two TypeScript scripts come to roughly 700 lines together, plus three templates and a README. You can read the whole thing in thirty minutes and understand exactly what it does. No proprietary algorithm, no clever indexing, no hidden complexity. If it stopped existing tomorrow, you could rewrite it in an afternoon.

Second, the value lives in the discipline, not the tool. The tool only makes the discipline machine-verifiable. Separating session state from project state, drafting the next prompt at every session close, validating before push: those rules work regardless of which tool enforces them. Take the discipline and either use my tool or build your own. Either outcome is fine. The bad outcome is running AI-driven sessions with no discipline at all, which is what I see most teams doing today.

Third, the pattern is missing from the field. I read AI workflow articles every week. I have seen exactly zero of them describe session-state management. They describe prompts, tools, sub-agents, system messages, model selection: every layer except the one that matters when you are running session 47 of a project that started in February. The cockpit fills that gap.

Where to find it:

Zero network calls, zero telemetry, zero auth.


What this means for the five pillars

The original article described five pillars: CLAUDE.md, session architecture, phase-based development, the multi-agent audit loop, and the authority structure. Those pillars are still right. Cockpit is not a sixth pillar. It sits underneath them as the connective tissue between sessions.

The CLAUDE.md tells Claude how to be a CTO. The session architecture tells Claude how to work on a feature. Phase-based development tells Claude how to decompose a problem. The audit loop tells Claude how to verify its own output. The authority structure tells Claude when to refuse.

The cockpit tells Claude what day it is in the life of the project, what was done yesterday, what is due today, and whether the handoff from yesterday's session is intact. Without the cockpit, the five pillars work for one session. With the cockpit, they work for months of sessions linked together.

If you read the original article and built your own version of the workflow, you probably have some informal version of the cockpit already: a STATUS.md, a NEXT.md, a habit of writing session logs. What I am releasing today is that informal version formalised, validated, and made shareable.


How to adopt it

Three paths, depending on where you are.

If you have no AI workflow yet: read the five-pillar article first. The cockpit is a layer on top of the pillars. Without the pillars, the cockpit is just a state.json with no sessions to manage. Build the workflow first.

If you have a workflow but no cockpit: run npx @justethales/cockpit init in your project root. It scaffolds the cockpit/ directory with the templates pre-filled, drops the two CLI scripts in place, and tells you the four lines you need to add to your package.json. Edit cockpit/now.md and cockpit/state.json to describe where you actually are. Start your next session with npx @justethales/cockpit status instead of pasting the previous session's end message. You will feel the difference by session three.

If you already have a cockpit-like discipline: scaffold into a temporary directory with npx @justethales/cockpit init, compare what it ships against what you have, take the parts you do not have, ignore the parts you have improved on. The validator is the most portable piece. Bolt it onto whatever shape you have already built.


What is still open

The honest review of my own work, the same review that surfaced last Friday's bug, identified four weaknesses I have not addressed yet:

The cockpit pollutes git log with chore(cockpit): commits. Every state.json bump is its own commit. Over a year of sessions, that is noise in the changelog. I have not solved this yet. The two paths are: move the cockpit to an orphan branch, or accept the noise.

The validator catches metadata drift, not intent drift. If now.md lies about what was built, the validator will not catch it. Nothing will, except humans and the post-implementation audit. I think this is fundamental.

The CLI is Node-only. Python, Rust, Go projects can run it via npx but pay a Node dependency they would not otherwise want. A binary distribution is a one-day port if the demand exists. Right now it does not.

No rollback path. If a session ships a phase that turns out broken in production, the cockpit has no concept of "Section C un-shipped, revert and retry." state.json walks backwards manually. I have not needed this yet. The day I do, I will have to invent it.

I am shipping these as known limitations because hiding them would be dishonest. The cockpit is good. It is not finished.


The bigger point

I am writing this on a Friday night instead of shipping more code because I think the pattern matters more than the tool.

Software development is entering a phase where the bottleneck is no longer typing speed, architectural intuition, or the ability to read a stack trace. The new bottleneck is the ability to maintain coherent state across long-running AI-driven workflows. Developers who figure out how to keep dozens of sessions aligned, without burning hours a week on context reconstruction, will ship more and worry less than the ones who keep treating each session as a fresh start.

The cockpit is one answer to that problem. There are probably better ones. The problem itself is not in doubt.

Five months ago I wrote that the way you use Claude is the reason you are not getting what you want from it. I would amend that today: the way you manage state between sessions is the reason you are not shipping. Fix the state-management problem and the rest gets easier.

The cockpit is free. The discipline is the price.


Resources

If this saves you half an hour a week, that is enough. If it saves you more, tell me about it. I want to hear which failure modes it catches that I have not anticipated.

— Thales Abidjan, Côte d'Ivoire 2026-05-30

Share this article:

Responses

Write a response
0/2000
Loading responses...

Related Articles