Back to flin
flin

What the Audit Taught Us About Building a Language

Lessons learned from auditing a 186,000-line programming language built in 42 days.

Thales & Claude | March 25, 2026 10 min flin
flinauditlessons-learnedmethodologyreflection

Auditing 186,252 lines of code is not just a defect-finding exercise. It is a mirror held up to every architectural decision, every process choice, and every trade-off made across 301 sessions of development. The defects themselves are symptoms. The lessons are in the patterns -- what categories of defects recur, which subsystems accumulate technical debt fastest, and where the development process creates systematic blind spots.

FLIN's audit taught us seven lessons about building a programming language. Some confirmed what we already suspected. Others surprised us. All of them will shape how we build software going forward.

Lesson 1: Dual Dispatch Tables Are a Design Flaw

The most critical finding of the audit -- the duplicate CreateMap opcode -- was not an individual bug. It was a structural vulnerability. FLIN's VM has two execution functions (run() and execute_until_return()) that both need to handle the same opcodes. This dual-dispatch architecture is the root cause of an entire category of defects.

When Session 273 audited execute_until_return() for opcode coverage, it found that only 59 of 170+ opcodes were handled. That is a 35% coverage rate. Every missing opcode was a silent failure -- a continue statement that skipped the operation as if it had never been emitted by the compiler.

// The architectural problem: two loops that must stay synchronized
// but have no mechanism to enforce it

// Solution 1: shared dispatch function fn dispatch_opcode( &mut self, opcode: OpCode, code: &[u8], ) -> Result { match opcode { OpCode::CreateMap => { / single implementation / } OpCode::Add => { / single implementation / } // ... all opcodes in one place } }

// Solution 2: macro-generated match arms macro_rules! opcode_dispatch { ($self:expr, $opcode:expr, $code:expr) => { match $opcode { OpCode::CreateMap => $self.handle_create_map($code)?, OpCode::Add => $self.handle_add($code)?, // ... generated from a single definition } }; } ```

The lesson is not specific to FLIN. Any system with parallel dispatch tables -- event handlers in two locations, protocol parsers with multiple entry points, command processors with different execution modes -- is vulnerable to the same divergence. The fix is always the same: share a single implementation, either through a common function, a macro, or code generation.

Lesson 2: Silent Failures Are the Most Expensive Bugs

Three of the five most severe audit findings involved silent failures:

  • CreateMap silently dropping keys for Value::Text inputs
  • Entity.where() silently returning all entities instead of filtered results
  • Validators silently rejecting saves with no error or warning

Each of these bugs was expensive not because of the damage they caused, but because of the time spent diagnosing them. When a FLIN developer's translations do not work, they do not suspect the opcode layer. When an entity query returns too many results, they blame their filter syntax. When a save appears to succeed but the data is gone on refresh, they question the database.

// The cost of silence: a developer's debugging journey
//
// "My translations don't work"
//   -> Check translation map: looks correct
//   -> Check t() function: works in console
//   -> Check template rendering: correct syntax
//   -> Check scope: variables accessible
//   -> Hours later: the map construction silently dropped keys
//
// "My filter doesn't work"
//   -> Check predicate syntax: correct
//   -> Check entity data: present
//   -> Check query: returns results (too many)
//   -> Hours later: the predicate was popped and discarded

// The correct pattern: fail loudly OpCode::QueryWhere => { let predicate = self.pop()?; let entity_type = self.pop()?;

match self.apply_predicate(&entity_type, &predicate) { Ok(filtered) => self.push(Value::List(filtered))?, Err(e) => { // NEVER silently fall back to returning all entities return Err(VmError::QueryError { entity: entity_type, predicate: format!("{:?}", predicate), cause: e.to_string(), }); } } } ```

The principle we adopted after the audit: every operation that can fail must either succeed with the correct result or fail with an error message that points to the cause. There is no acceptable middle ground where an operation "succeeds" with wrong data.

Lesson 3: Value Representation Must Be Transparent to Operations

FLIN has two representations for strings: Value::Text(String) for short inline strings and Value::Object(ObjectId) pointing to heap-allocated ObjectData::String. This optimization reduces allocation pressure for common small strings. But it created a contract that every string operation must honor: both representations must produce identical behavior.

The audit found violations of this contract in OpCode::Trim, in extract_string(), and in OpCode::CreateMap. Each violation was a separate bug with a separate symptom, but they all had the same root cause -- a match block that handled Value::Object but forgot Value::Text.

// The pattern that creates bugs
match value {
    Value::Object(id) => self.get_string(id)?.do_something(),
    _ => String::new(),  // WRONG: Value::Text is not handled
}

// The pattern that prevents bugs match value { Value::Object(id) => self.get_string(id)?.do_something(), Value::Text(s) => s.do_something(), _ => return Err(VmError::TypeError { ... }), } ```

The broader lesson: when a system has multiple representations for the same semantic concept, there must be a single function that normalizes them. Every consumer should call the normalizer rather than pattern-matching on representations directly.

// The normalizer pattern
fn as_string(&self, value: &Value) -> Result<Cow<str>, VmError> {
    match value {
        Value::Text(s) => Ok(Cow::Borrowed(s)),
        Value::Object(id) => {
            let s = self.get_string(*id)?;
            Ok(Cow::Borrowed(s))
        }
        _ => Err(VmError::TypeError {
            expected: "text",
            got: value.type_name(),
        })
    }
}

Lesson 4: Session-Based Development Creates Accessibility Gaps

The function audit revealed that 95% of built-in functions were implemented in bytecode but only 12% were accessible from templates. This gap arose because each session focused on making its specific feature work end-to-end, which meant implementing the function in bytecode and testing it in bytecode context. The template exposure was a separate step that was routinely deferred.

This is a structural consequence of session-based development. Each session has a goal. The goal is always "make X work," not "make X work from every possible calling context." The result is a codebase where features work in the context where they were developed but not in the contexts where users will actually use them.

The mitigation is a checklist. For every function added to FLIN, the checklist asks: Does this work from bytecode? Does this work from templates? Does this work from routes? Does this work from WebSocket handlers? The audit showed us that without the checklist, the answer to the second and subsequent questions was usually "not yet."

Lesson 5: The Compiler's Optimizations Must Be Invisible

The CreateMap bug only manifested when the compiler chose to emit a string as Value::Text instead of Value::Object. This choice was an optimization -- Value::Text avoids a heap allocation. But the optimization was not invisible to the rest of the system. The opcode handler in run() could not process Value::Text keys, creating a correctness bug that depended on a compiler optimization decision.

The principle: compiler optimizations must be transparent to the runtime. If the compiler chooses to represent a value differently for performance reasons, every part of the runtime that touches that value must handle all possible representations. If this contract is too expensive to maintain, the optimization should not be made.

// This optimization is only safe if EVERY consumer handles both forms
enum Value {
    Text(String),       // Optimization: inline string
    Object(ObjectId),   // Standard: heap-allocated string
    // ...
}

// Compiler's choice must be invisible: // compile("hello") -> Value::Text("hello") OR Value::Object(alloc("hello")) // Both must produce identical behavior everywhere ```

Lesson 6: An Audit Is a Knowledge Transfer

Before the audit, FLIN's codebase existed in a distributed state -- partially in session logs, partially in Claude's training data, partially in the code itself, but not in any single entity's complete understanding. The audit changed that. By reading every line, the auditor built a complete mental model of the system: how modules connect, where state flows, what invariants are maintained, and where they are violated.

This knowledge transfer is as valuable as the bug fixes. Future development sessions can reference the audit results to understand not just what a piece of code does, but how it relates to the rest of the system. The module dependency graph, the compilation pipeline diagram, the key execution paths -- these are navigation aids for a codebase that is too large for any single session to hold in context.

Architecture (from the audit):

lib.rs lexer/ (5,877 lines) -- Clean. No issues. parser/ (21,735 lines) -- Clean. Well-tested. resolver/ (1,858 lines) -- Small. Stable. typechecker/(9,925 lines) -- Medium issues. Now fixed. codegen/ (11,936 lines) -- Had stale TODOs. Now fixed. vm/ (61,054 lines) -- Most issues concentrated here. server/ (17,908 lines) -- WebSocket gaps. Now fixed. database/ (28,395 lines) -- Persistence bugs. Now fixed. storage/ (7,866 lines) -- S3 missing. Now fixed. ai/ (2,208 lines) -- Clean. Small surface. ```

Lesson 7: Zero Security Issues Is Not an Accident

The audit found zero security vulnerabilities across 186,252 lines. This was not luck. It was a consequence of architectural decisions made in the earliest sessions:

  • FLIN does not use SQL, so SQL injection is structurally impossible.
  • FLIN's template engine escapes output by default, so XSS requires explicit opt-in.
  • FLIN's file operations validate paths against allowed directories, so path traversal is caught.
  • Rust eliminates buffer overflows and use-after-free, so memory safety bugs do not exist.

The lesson is that security is not primarily a matter of careful coding. It is a matter of choosing architectures that eliminate categories of vulnerability entirely. By avoiding SQL, FLIN does not need to worry about injection. By defaulting to escaped output, FLIN does not need to remember to escape. The safest code is the code you do not have to write.

Looking Forward

The audit was not the end. It was the transition from building to hardening. The 301 sessions that built FLIN were an act of creation -- messy, fast, iterative, occasionally brilliant, occasionally flawed. The audit and its fix sessions were an act of discipline -- systematic, thorough, unromantic, essential.

FLIN emerged from the audit as a stronger system. Not because it had no bugs -- it still had room for improvement in fuzz testing, concurrency verification, and property-based testing. But because every bug it did have was now known, documented, and either fixed or tracked. The unknown unknowns had been converted to known quantities. And known quantities can be managed.

The next article turns to a specific category of audit findings that deserved its own investigation: production panic calls and the systematic effort to eliminate them from FLIN's runtime.

---

This is Part 153 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [152] 3,452 Tests, Zero Failures - [153] What the Audit Taught Us About Building a Language (you are here) - [154] Production Panic Calls: Tracking and Elimination

Share this article:

Responses

Write a response
0/2000
Loading responses...

Related Articles