#020 -- The Complete Compilation Pipeline, End to End

Source code enters. A running application exits. Between them, six phases transform text into execution.

The previous articles in this series examined individual phases: the code generator, the bytecode format, the error diagnostic system. This article steps back and traces a single FLIN program through the entire compilation pipeline, from the first character the lexer reads to the last instruction the virtual machine executes. The goal is to show how the phases connect -- how each phase's output becomes the next phase's input, what information is preserved across boundaries, what is discarded, and why.

The program we will follow is a simplified FLIN application:

flinentity Todo {
    title: text
    done: bool = false
}

todos = Todo.all

<div>
    {for todo in todos}
        <p class={if todo.done "completed" else "pending"}>
            {todo.title}
        </p>
    {/for}
</div>

This program declares an entity, queries all instances from the database, and renders them in a loop with conditional styling. It exercises five of FLIN's distinctive features: entities, queries, views, loops, and conditionals. Let us trace it through the pipeline.

Phase 1: Lexical Analysis

The lexer reads the source as a stream of characters and produces a stream of tokens. Each token records what it is (its TokenKind), where it is (its Span), and what the original text looked like (its lexeme).

The lexer operates in three modes. It begins in Code mode. When it encounters <div>, the < followed by an alphabetic character triggers a switch to View mode. Inside view mode, { switches to ViewExpression mode, and } switches back to View mode.

For our program, the token stream begins:

Keyword(Entity)   "entity"     1:1
Identifier("Todo") "Todo"      1:8
LeftBrace          "{"         1:13
Newline                        1:14
Identifier("title") "title"   2:5
Colon              ":"         2:10
Keyword(Text)      "text"      2:12
Newline                        2:16
Identifier("done")  "done"    3:5
Colon              ":"         3:9
Keyword(Bool)      "bool"      3:11
Equal              "="         3:16
Keyword(False)     "false"     3:18
Newline                        3:23
RightBrace         "}"         4:1
Newline                        4:2
Identifier("todos") "todos"   6:1
Equal              "="         6:8
Identifier("Todo")  "Todo"    6:10
Dot                "."         6:14
Keyword(All)       "all"       6:15
Newline                        6:18
...

Two observations. First, every token carries a precise position -- line and column for human-readable error messages, byte offset for machine processing. This position information propagates through the entire pipeline and ultimately appears in error diagnostics and debug information. Second, the lexer resolves the ambiguity between < as a comparison operator and < as a tag opener based on context. In code mode, < followed by a letter produces TagOpen. In expression mode, < followed by a digit or space produces Less.

The lexer's output is Vec<Token> -- a flat, ordered sequence with no hierarchy. All structure is gone. The lexer has no idea what an entity declaration looks like; it just knows that entity is a keyword, Todo is an identifier, and { is a left brace. Structure is the parser's job.

Phase 2: Syntactic Analysis

The parser consumes the token stream and builds an Abstract Syntax Tree. Where the lexer sees a flat sequence, the parser sees hierarchical structure: statements containing expressions, view elements containing children, blocks containing statements.

The parser uses recursive descent for statements and a Pratt parser for expressions. Statement parsing dispatches on the leading token:

Keyword(Entity) -- parse entity declaration
TagOpen -- parse view element
Keyword(Save) -- parse save statement
Identifier followed by Equal -- parse variable declaration or assignment
Identifier followed by Dot -- parse expression (entity query, field access)

For our program, the AST looks like this:

Program
  Stmt::EntityDecl
    name: "Todo"
    fields:
      FieldDecl { name: "title", type: Text, default: None }
      FieldDecl { name: "done", type: Bool, default: Some(Bool(false)) }

  Stmt::VarDecl
    name: "todos"
    type_ann: None
    value: Expr::EntityQuery { entity: "Todo", operation: All }

  Stmt::View
    ViewElement
      tag: "div"
      attributes: []
      children:
        ViewChild::For
          variable: "todo"
          iterable: Expr::Identifier("todos")
          body:
            ViewChild::Element
              tag: "p"
              attributes:
                ViewAttribute
                  name: "class"
                  value: Dynamic(
                    Expr::If {
                      condition: Expr::FieldAccess {
                        object: Identifier("todo"),
                        field: "done"
                      },
                      then: Expr::String("completed"),
                      else: Expr::String("pending")
                    }
                  )
              children:
                ViewChild::Expression(
                  Expr::FieldAccess {
                    object: Identifier("todo"),
                    field: "title"
                  }
                )

The parser has resolved every ambiguity. Todo.all is not a field access followed by an identifier -- it is an EntityQuery with operation All. The {for todo in todos} block is not a series of identifiers -- it is a ViewFor with a bound variable, an iterable expression, and a body of view children. The {if todo.done "completed" else "pending"} inside the class attribute is an inline conditional expression.

The AST preserves source spans on every node. When the type checker later encounters an error in the conditional expression, it can point to the exact position in the source file because the Expr::If node carries the span from the if keyword to the closing "pending" string.

Phase 3: Semantic Analysis

The type checker walks the AST and verifies that the program is semantically valid. It maintains a symbol table that maps names to types and scopes.

For our program, the type checker performs these steps:

Register entity schema. entity Todo { title: text, done: bool = false } adds Todo to the entity registry with two fields.

Type the query. Todo.all -- look up Todo in the entity registry, verify it exists, determine that .all returns [Todo] (a list of Todo entities).

Type the variable. todos = Todo.all -- infer that todos has type [Todo].

Enter the for loop. for todo in todos -- verify that todos is iterable (it is a list), bind todo with type Todo in the loop scope.

Type the conditional. if todo.done -- verify that todo has field done, verify that done is Bool (suitable for a condition). The then-branch ("completed") is Text, the else-branch ("pending") is Text, so the conditional expression has type Text.

Type the attribute. class={...} -- verify that the dynamic attribute value is Text. It is, because the conditional expression resolves to Text.

Type the text binding. {todo.title} -- verify that todo has field title, verify that the result is displayable. Text is displayable.

If the programmer had written {todo.titl} (a typo), the type checker would report:

error[T0005]: entity 'Todo' has no field 'titl'
  --> app.flin:11:14
   |
11 |             {todo.titl}
   |                   ^^^^ unknown field
   |
   = help: Did you mean 'title'?

The type checker does not transform the AST -- it validates it. The output of Phase 3 is the same AST with type annotations attached to expression nodes. The code generator can rely on these annotations to emit correct bytecode without re-analyzing types.

Phase 4: Code Generation

The code generator walks the typed AST and emits bytecode. Each AST node translates to a sequence of opcodes. The output is a Chunk: a constant pool, a byte array of instructions, and a line table mapping instruction offsets to source positions.

For our program, the code generation proceeds top-down:

Entity declaration. The entity Todo declaration does not emit runtime instructions. Instead, it registers the schema in the code generator's entity table, which is later serialized into the .flinc file's entity schema section. At VM startup, the runtime reads this section and initializes FlinDB's schema registry.

Variable declaration with query. todos = Todo.all emits:

QueryAll  [Todo_idx]      ; Push list of all Todo entities
StoreGlobal [todos_idx]   ; Store in global 'todos'

View element. <div> emits:

CreateElement [div_idx]   ; Create <div> element

For loop in view. {for todo in todos} emits:

LoadGlobal [todos_idx]    ; Push the todos list
StartFor [end_addr]       ; Begin iteration, jump to end_addr if empty
NextFor [todo_slot]       ; Bind current item to local 'todo'

Nested view element with conditional attribute. The <p> with its conditional class emits:

CreateElement [p_idx]     ; Create <p> element
LoadLocal [todo_slot]     ; Load 'todo'
GetField [done_idx]       ; Get .done field
JumpIfFalse [else_addr]   ; If false, jump to else
LoadConst [completed_idx] ; Push "completed"
Jump [end_attr]           ; Jump past else
; else_addr:
LoadConst [pending_idx]   ; Push "pending"
; end_attr:
BindAttr [class_idx]      ; Bind to 'class' attribute

Text binding. {todo.title} emits:

LoadLocal [todo_slot]     ; Load 'todo'
GetField [title_idx]      ; Get .title field
BindText                  ; Bind as reactive text

Closing elements and loop. The closing tags emit:

CloseElement              ; Close <p>
EndFor                    ; Loop back to NextFor
CloseElement              ; Close <div>
Halt                      ; End program

The complete bytecode for this program is approximately 40-50 bytes of instructions, referencing 8-10 constants in the pool. A full application that displays, creates, edits, and deletes Todo items might be 200-300 bytes.

Phase 5: Bytecode Serialization

The code generator produces an in-memory Chunk. To persist it as a .flinc file, the serializer writes the 64-byte header, the constant pool, the code section, the debug info (in development mode), and the entity schema section.

rustpub fn serialize(chunk: &Chunk, entities: &[EntitySchema]) -> Vec<u8> {
    let mut output = Vec::new();

    // Magic
    output.extend_from_slice(b"FLIN");

    // Version
    output.push(0); // major
    output.push(1); // minor
    output.push(0); // patch

    // Flags
    let flags = Flags::DEBUG_INFO | Flags::HAS_VIEWS | Flags::HAS_ENTITIES;
    output.push(flags);

    // Section offsets (calculated after writing sections)
    // ...

    // Constant pool
    for constant in &chunk.constants {
        serialize_constant(&mut output, constant);
    }

    // Code section
    output.extend_from_slice(&chunk.code);

    // Debug info
    for (offset, line) in chunk.lines.iter().enumerate() {
        serialize_line_entry(&mut output, offset as u32, *line);
    }

    // Entity schemas
    for schema in entities {
        serialize_entity_schema(&mut output, schema);
    }

    output
}

The serialized .flinc file is portable. It contains no absolute paths, no platform-specific code, and no references to the build environment. The same .flinc file can be executed by a FLIN VM on any platform.

Phase 6: Virtual Machine Execution

The VM loads the .flinc file, initializes its subsystems, and begins executing at instruction offset 0.

At startup, the VM:

Reads the header and validates the magic number and version.
Loads the constant pool into memory.
Reads the entity schemas and registers them with FlinDB.
Initializes the operand stack, call stack, and global variable table.
Sets the instruction pointer to the code section offset.

Then the execution loop begins:

loop {
    let opcode = read_byte(ip);
    ip += 1;

    match opcode {
        OpCode::QueryAll => {
            let type_idx = read_u16(ip);
            ip += 2;
            let entity_name = constants[type_idx].as_identifier();
            let results = flindb.query_all(entity_name);
            let list_id = heap.alloc_list(results);
            stack.push(Value::Object(list_id));
        }

        OpCode::StoreGlobal => {
            let name_idx = read_u16(ip);
            ip += 2;
            let name = constants[name_idx].as_identifier();
            let value = stack.pop();
            globals.insert(name.to_string(), value);
        }

        OpCode::CreateElement => {
            let tag_idx = read_u16(ip);
            ip += 2;
            let tag = constants[tag_idx].as_identifier();
            element_stack.push(Element::new(tag));
        }

        OpCode::StartFor => {
            let end_addr = read_u16(ip);
            ip += 2;
            let list = stack.pop();
            let items = heap.get_list(list);
            if items.is_empty() {
                ip = end_addr as usize;  // Skip loop body
            } else {
                // Push iterator state
                iterators.push(Iterator::new(items, 0));
            }
        }

        OpCode::BindText => {
            let value = stack.pop();
            let text = value.to_display_string(&heap);
            let element = element_stack.current();
            element.add_text_binding(text, /* dependency info */);
        }

        OpCode::Halt => break,

        // ... 70+ more opcodes
    }
}

For our Todo program, the execution sequence is:

QueryAll -- FlinDB retrieves all Todo entities, the VM allocates a list on the heap.
StoreGlobal -- the list is stored as the global variable todos.
CreateElement -- a <div> element is created on the element stack.
LoadGlobal -- todos is pushed onto the operand stack.
StartFor -- the list is popped, an iterator is created. If the list is empty, execution jumps to EndFor.
NextFor -- the current Todo entity is bound to the local variable todo.
CreateElement -- a <p> element is created.
LoadLocal, GetField -- todo.done is evaluated.
JumpIfFalse -- if done is false, jump to the "pending" branch.
LoadConst -- push the class string ("completed" or "pending").
BindAttr -- bind the class attribute to the <p> element.
LoadLocal, GetField -- todo.title is evaluated.
BindText -- bind the title as reactive text content.
CloseElement -- close the <p> element.
EndFor -- advance the iterator and jump back to NextFor, or continue if exhausted.
CloseElement -- close the <div> element.
Halt -- execution ends.

The result is a tree of elements with reactive bindings. If a Todo's done field changes, the reactivity system knows which <p> element to update and which class attribute to rebind -- because the BindAttr and BindText instructions recorded the dependencies at execution time.

What Crosses Phase Boundaries

Understanding the pipeline means understanding what information survives each phase transition and what is discarded.

Lexer to Parser. The token stream preserves: the kind of each token, its exact position in the source, and its original text. It discards: whitespace (except newlines, which are significant for statement termination), comments, and character-level detail (the lexer does not tell the parser that >= was formed from two characters).

Parser to Type Checker. The AST preserves: the full hierarchical structure, source spans on every node, identifier names, literal values, and operator types. It discards: token-level detail (the parser does not tell the type checker that entity was the keyword token), punctuation (braces, commas, colons that were consumed during parsing), and newlines.

Type Checker to Code Generator. The typed AST preserves: everything from the AST plus type annotations on expression nodes and resolved entity schemas. It discards: nothing. The type checker is a validation pass, not a transformation pass.

Code Generator to VM. The bytecode preserves: the execution semantics of the program (operations, control flow, data), constant values, entity schemas, and (in debug mode) source location mappings. It discards: the tree structure (bytecode is flat), variable names (replaced by stack slots and constant pool indices), type information (the VM is dynamically typed at runtime), and syntactic details (the VM does not know whether count++ was prefix or postfix -- it just sees LoadGlobal, Dup, Incr, StoreGlobal).

This progressive refinement -- each phase extracting what it needs and discarding what it does not -- is what makes the pipeline efficient. The VM does not carry the weight of the AST. The type checker does not carry the weight of raw tokens. Each phase operates on exactly the representation it needs.

The Pipeline as a Product

The compilation pipeline is not just an implementation detail -- it is a product feature. Each phase produces inspectable output:

flin emit-tokens app.flin -- prints the token stream
flin emit-ast app.flin -- prints the AST as indented text
flin check app.flin -- type-checks without generating bytecode
flin emit-bytecode app.flin -- prints the bytecode disassembly
flin build app.flin -- produces the .flinc binary
flin run app.flin -- compiles and executes in one step

These modes exist because debuggability was a design principle from the start. When a program does not behave as expected, the developer can inspect the output at any phase to find where the problem is. Is the lexer producing wrong tokens? Is the parser building the wrong tree? Is the type checker accepting something it should reject? Is the code generator emitting the wrong opcodes? Each phase's output is human-readable, and each phase can be tested in isolation.

This is the complete pipeline. Six phases. Source text to running application. Each phase does one thing well, passes its output to the next, and can be inspected independently. It was built in ten sessions, tested with 590 tests, and runs programs with entities, views, temporal queries, and AI-powered search -- all compiled to a bytecode format that fits in a few hundred bytes.

This is Part 20 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO built a programming language compiler in sessions measured in minutes, not months.

Next in the series: The virtual machine's internals -- stack frames, heap allocation, garbage collection, and how the VM executes FLIN bytecode at runtime.

#020 -- The Complete Compilation Pipeline, End to End

Phase 1: Lexical Analysis

Phase 2: Syntactic Analysis

Phase 3: Semantic Analysis

Phase 4: Code Generation

Phase 5: Bytecode Serialization

Phase 6: Virtual Machine Execution

What Crosses Phase Boundaries

The Pipeline as a Product

Responses

Related Articles

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

The gate caught its own drift: one day inside CASP with Claude Fable 5

The CASP Transplant: How The Six-File Discipline Moved From Conductor To An Anti-Fraud Transport ERP, What The /next Skill Adds When The Operator Just Types 'next', And Why The Cost Of CASP Drift Rises When The Project Is Someone Else's Cash