Building a programming language across hundreds of sessions generates a meta-problem: keeping track of what has been built. When you have 237 sessions, 3,537 tests, 75 sub-tasks across 7 milestones, version numbers in 6 different files, and progress percentages that change with every session, the tracking data itself becomes a system that needs maintenance.
Session 238 was a documentation synchronization session. No code was written. No tests were added. Instead, we reconciled every tracking file, every version number, and every progress metric with the actual state of the project after Sessions 228 through 237. This article documents why tracking sync matters, how we structured it, and what we learned about the meta-engineering of building a large software project.
The Tracking Problem
FLIN's development is tracked across several files:
Cargo.toml-- the Rust package versionCLAUDE.md-- the project instructions including version and test countsREADME.md-- the public-facing documentation with version, test counts, and feature listsPROJECT.md-- the project overview with session count and capabilitiesinstall.sh-- the installer script with version numbersrc/main.rs-- the HTTP server's reported versionRELEASES.md-- the release notes history- Tracking files in
_private/todo/-- task completion percentages per milestone
When Session 237 completed the GC CLI integration, the following things needed to change:
1. FM-7 milestone status needed to flip from "in progress" to "complete." 2. Test count needed to update from the pre-Session-228 number to 3,537. 3. Session count needed to update from 210 (last sync) to 237. 4. Version needed to bump from v0.9.0-alpha to v0.9.2-alpha. 5. The feature list needed to include the 10 new capabilities added in Sessions 228-237.
If any of these updates are missed, the tracking data becomes inconsistent. The README says 2,901 tests but the actual count is 3,537. The install script downloads v0.9.0-alpha but the server reports v0.9.2. The tracking file says FM-7 is 75% complete but all 8 tasks are done.
Inconsistent tracking data is worse than no tracking data. It erodes trust in all the other numbers.
The Sync Process
We developed a systematic sync process that we execute after every significant batch of sessions:
// Pseudocode for the tracking sync process
fn sync_tracking_files() {
// Step 1: Count actual tests
actual_tests = run("cargo test --lib 2>&1 | tail -1")
integration_tests = run("cargo test --test integration_e2e 2>&1 | tail -1")
total = actual_tests.passed + integration_tests.passed// Step 2: Count actual sessions session_files = glob("_private/session-logs/SESSION-*.md") latest_session = session_files.max_by(number)
// Step 3: Determine version // Major changes (new subsystem) = minor version bump // Bug fixes and improvements = patch version bump new_version = determine_version(changes_since_last_sync)
// Step 4: Update all files update_file("Cargo.toml", version: new_version) update_file("CLAUDE.md", version: new_version) update_file("README.md", version: new_version, tests: total) update_file("PROJECT.md", sessions: latest_session) update_file("install.sh", version: new_version) update_file("src/main.rs", server_version: new_version)
// Step 5: Update milestone tracking for milestone in milestones { completed = count_completed_tasks(milestone) total_tasks = count_total_tasks(milestone) update_tracking(milestone, completed, total_tasks) }
// Step 6: Write release notes append_release_notes(new_version, changes) } ```
This process is manual today. We execute it by reading each file, verifying the numbers, and making the updates. In a future session, we plan to automate it as a flin sync command that reads the codebase and updates all tracking files automatically.
Version Numbering Strategy
FLIN follows semantic versioning with an alpha qualifier:
- v0.9.0-alpha -- the initial MVP (Session 210)
- v0.9.1-alpha -- post-MVP improvements (Session 210-227)
- v0.9.2-alpha -- file management complete (Session 237)
The version number is not just a label. It communicates the project's maturity:
- 0.x -- pre-1.0, breaking changes expected
- 0.9.x -- close to 1.0, feature-complete but still polishing
- alpha -- not yet recommended for production use
We chose to skip from v0.9.1-alpha to v0.9.2-alpha (not v0.9.1.1 or v0.9.1-alpha.2) because the changes in Sessions 228-237 were substantial: an entire milestone completed (FM-7), 9 document formats added, a new CLI command, and 636 new tests. That warrants a version bump, not a patch.
The 10-Session Summary
Session 238 documented the accomplishments of Sessions 228-237, which had not yet been reflected in tracking files:
| Session | Feature | Tests Added |
|---|---|---|
| 228 | CSV and XLSX extraction | +34 |
| 229 | JSON and YAML extraction | +47 |
| 230 | RTF extraction | +22 |
| 231 | XML and XPath extraction | +61 |
| 232 | Semantic auto-conversion | +8 |
| 233 | Zstd compression | +25 |
| 234 | Blob GC infrastructure | +17 |
| 235 | File preview generation | +33 |
| 236 | HTTP preview integration | +6 |
| 237 | GC CLI and HTTP integration | +10 |
Total: 263 new tests across 10 sessions. The test count went from 3,274 (post-Session 227) to 3,537 (post-Session 237).
These 10 sessions added support for 9 document formats (PDF, DOCX, HTML, CSV, XLSX, JSON, YAML, RTF, XML), a complete compression system using Zstd, file preview generation, and the garbage collection pipeline. In aggregate, this represents FLIN's ability to handle any document type a web application might encounter -- ingest, parse, index, search, compress, preview, and clean up.
Milestone Progress After Sync
After the tracking sync, the file management milestones stood at:
FM-1: File Upload HTTP 12/12 -- COMPLETE
FM-2: File Field Type 8/8 -- COMPLETE
FM-3: Storage Backends 16/16 -- COMPLETE
FM-4: Document Parsing 13/13 -- COMPLETE (updated)
FM-5: Chunking and RAG 5/10 -- IN PROGRESS
FM-6: Semantic File Search 7/8 -- IN PROGRESS
FM-7: Compression and GC 8/8 -- COMPLETE (updated)Overall: 69 of 75 tasks complete (92%). The remaining 6 tasks are in FM-5 (advanced chunking strategies) and FM-6 (search scoring refinements), both of which enhance existing functionality rather than adding new capabilities.
Why Tracking Sync Matters
The tracking sync session produced no new functionality. From a pure feature perspective, it was a zero-output session. But from an engineering perspective, it was essential.
For planning: Accurate progress metrics enable realistic planning. If the tracking file says FM-5 is 50% complete when it is actually 75% complete, the planning process overestimates remaining work and misallocates effort.
For communication: When someone asks "how close is FLIN to v1.0?", the answer should come from verified data, not memory. Tracking files that are 27 sessions out of date cannot answer this question accurately.
For motivation: Progress tracking provides a tangible sense of advancement. Going from 58% to 92% completion on the file management system is motivating in a way that "we did some more work" is not. The numbers make the progress concrete.
For debugging: When a test count discrepancy surfaces -- the README says 3,274 but cargo test reports 3,537 -- it signals that something was not tracked. The discrepancy itself is informative: 263 tests were added in 10 sessions, suggesting an average of 26 tests per session, which is a useful metric for estimating future work.
The Meta-Engineering Lesson
Building FLIN taught us something about software engineering that is rarely discussed: the engineering of the engineering process itself. The code is the product. The tests verify the code. The tracking files verify the tests. The sync process verifies the tracking files. Each layer adds confidence that the lower layers are correct.
This is not overhead. This is how reliable systems are built. A space mission does not just build the rocket -- it builds the tracking systems that monitor the rocket, the verification procedures that validate the tracking systems, and the review processes that audit the verification procedures. Each layer catches errors that the lower layers miss.
For FLIN, the tracking sync catches a specific category of error: drift between reality and documentation. Over 10 sessions, the project accumulated 263 new tests, 1,100 new lines of code, 2 completed milestones, and 9 new document formats -- none of which were reflected in the project's external-facing documentation. Without the sync, a new contributor reading the README would have an inaccurate picture of the project's capabilities.
Session 238 took approximately 30 minutes. It updated 12 files. It wrote zero lines of production code. And it was one of the most valuable sessions in the entire project.
The State After Sync
After Session 238, the project state was fully consistent:
- Version: v0.9.2-alpha (all 6 files)
- Tests: 3,537 (2,920 library + 617 integration)
- Sessions: 238
- FM progress: 69/75 (92%)
- Supported document formats: 9
- CLI commands:
dev,build,test,migrate,gc - Native functions: 409+
- Embedded components: 180
- Embedded icons: 1,675
Every number in every file matched reality. The project was ready for the next phase of development -- not because it had new features, but because we knew exactly where we stood.
---
This is Part 189 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.
Series Navigation: - [188] GC, CLI, and HTTP Integration Testing - [189] Tracking Sync and State Management (you are here) - [190] From Alpha to Stable: The Remaining Work