#014 -- The Abstract Syntax Tree: FLIN's Internal Representation

Between your source code and the running program lives a tree. Every node is a decision. Every branch is a relationship between parts of your program. The Abstract Syntax Tree -- AST -- is the compiler's internal representation of what you wrote, stripped of all the syntactic noise that humans need to read code but machines do not.

Parentheses are gone. Semicolons are gone. Whitespace is gone. What remains is pure structure: this expression is a binary addition of these two sub-expressions. This statement declares a variable with this name and this initial value. This view element has these attributes and these children. The AST captures the meaning of the program, not its formatting.

FLIN's AST has an unusual challenge. Most programming languages have one kind of thing: code. FLIN has three: imperative code (variables, control flow, operations), declarative data (entity definitions with typed fields), and reactive views (HTML-like templates with embedded expressions). The AST must represent all three uniformly enough for the type checker and code generator to process them, while preserving enough domain-specific structure for each to retain its semantics.

This article examines how we designed FLIN's AST, what nodes it contains, and the trade-offs we made along the way.

The Top Level: Programs and Statements

A FLIN program is a sequence of statements. There is no main function, no module header, no boilerplate. You open a .flin file, write statements, and the compiler processes them top to bottom.

The AST represents this with two types:

rustpub struct Program {
    pub statements: Vec<Stmt>,
}

pub enum Stmt {
    VarDecl {
        name: String,
        type_annotation: Option<FlinType>,
        initializer: Expr,
        span: Span,
    },
    EntityDecl {
        name: String,
        fields: Vec<FieldDecl>,
        span: Span,
    },
    Save {
        expr: Expr,
        span: Span,
    },
    Delete {
        expr: Expr,
        where_clause: Option<Expr>,
        span: Span,
    },
    If {
        condition: Expr,
        then_body: Vec<Stmt>,
        else_body: Option<Vec<Stmt>>,
        span: Span,
    },
    For {
        variable: String,
        index: Option<String>,
        iterable: Expr,
        body: Vec<Stmt>,
        span: Span,
    },
    Route {
        method: HttpMethod,
        path: String,
        body: Vec<Stmt>,
        span: Span,
    },
    ExprStmt {
        expr: Expr,
        span: Span,
    },
    ViewElement(ViewNode),
}

Every statement variant carries a span -- the source location where it was parsed. This span propagates through the entire compilation pipeline: if the type checker finds an error in a variable declaration, it reports the span of that declaration, not a generic "somewhere in the file." If the code generator emits bytecode for a save statement, it records the span in the line-number table so that runtime errors can point back to the source.

Variable Declarations

The simplest statement is also the most common:

flincount = 0
name: text = "Juste"

Both produce Stmt::VarDecl nodes. The first has type_annotation: None -- the type will be inferred by the type checker. The second has type_annotation: Some(FlinType::Text) -- the developer has explicitly stated the type.

This optional annotation is a key design decision. FLIN's philosophy is "types you rarely write, safety you always get." The type checker will infer count as int from the literal 0. But when inference would be ambiguous (an empty list, a number that could be int or float), the developer can add an explicit annotation. The AST preserves both forms without preference, letting the type checker decide how to handle each case.

Entity Declarations

Entities are FLIN's data model. They are not classes, not structs, not tables -- they are all three at once. An entity declaration defines a persistent data type with typed fields, optional default values, and automatic versioning.

flinentity Todo {
    title: text
    done: bool = false
    created: time = now
}

The AST represents this as:

rustpub struct FieldDecl {
    pub name: String,
    pub field_type: FlinType,
    pub default: Option<Expr>,
    pub span: Span,
}

Each field has a mandatory type annotation (unlike variables, where annotation is optional). The default is an expression, not a value -- now is a keyword that evaluates to the current timestamp at the moment the entity is created. The AST preserves it as an Expr::Keyword(Keyword::Now) node, which the code generator will emit as a LoadNow bytecode instruction.

Entity declarations are pure data at the AST level. They do not generate control flow or expressions. The code generator handles them by registering the entity schema in a lookup table, so that subsequent save, delete, and find operations know the entity's structure.

Expression Nodes

Expressions are the core of any programming language, and FLIN's expression enum is the largest type in the AST:

rustpub enum Expr {
    // Literals
    Literal(LiteralValue),
    Identifier(String),
    List(Vec<Expr>),
    Map(Vec<(Expr, Expr)>),

    // Operations
    Binary { left: Box<Expr>, op: BinOp, right: Box<Expr> },
    Unary { op: UnaryOp, operand: Box<Expr> },
    Assign { target: Box<Expr>, value: Box<Expr> },

    // Access
    Member { object: Box<Expr>, field: String },
    Index { object: Box<Expr>, index: Box<Expr> },
    Call { callee: Box<Expr>, args: Vec<Expr> },
    MethodCall { object: Box<Expr>, method: String, args: Vec<Expr> },

    // FLIN-specific
    EntityConstruct { name: String, fields: Vec<(String, Expr)> },
    Ask { query: Box<Expr> },
    Search { query: Box<Expr>, entity: String, field: String, limit: Option<Box<Expr>> },
    Temporal { expr: Box<Expr>, time: Box<Expr> },
    Match { subject: Box<Expr>, arms: Vec<MatchArm> },

    // View (embedded in expressions when used inside views)
    ViewInterpolation(Box<Expr>),
}

Several aspects of this design are worth examining.

Box everywhere. Recursive types in Rust require heap allocation through Box, because the compiler needs to know the size of every type at compile time. A Binary expression contains two sub-expressions, each of which could themselves be Binary. Without Box, the type would have infinite size. This is not a performance concern -- the allocator cost of boxing a few hundred expression nodes is negligible compared to the work the type checker and code generator will do with them.

FLIN-specific variants. EntityConstruct, Ask, Search, and Temporal are not found in general-purpose language ASTs. They encode FLIN's domain-specific semantics directly in the tree structure. An Ask node does not desugar to a function call -- it is a first-class concept that the type checker validates (the query must be a text expression) and the code generator emits as a dedicated Ask bytecode instruction.

No parentheses, no precedence. The tree structure encodes precedence implicitly. The expression 2 + 3 * 4 is represented as Binary(Literal(2), Add, Binary(Literal(3), Mul, Literal(4))) -- the multiplication is a child of the addition, so it evaluates first. The parser did the work of encoding precedence into the tree structure; the AST does not need to record it separately.

View Nodes

FLIN's view layer is where the AST departs most dramatically from traditional language ASTs. A view node represents an HTML-like element with attributes, event handlers, and children:

rustpub struct ViewNode {
    pub tag: String,
    pub attributes: Vec<ViewAttribute>,
    pub children: Vec<ViewChild>,
    pub span: Span,
}

pub enum ViewAttribute {
    Static { name: String, value: String },
    Dynamic { name: String, expr: Expr },
    EventHandler { event: String, handler: Expr },
}

pub enum ViewChild {
    Element(ViewNode),
    Text(String),
    Expression(Expr),
    If { condition: Expr, then_children: Vec<ViewChild>, else_children: Option<Vec<ViewChild>> },
    For { variable: String, index: Option<String>, iterable: Expr, children: Vec<ViewChild> },
}

Consider this FLIN code:

flin<button class="primary" click={count++}>
    {count}
</button>

The AST representation:

ViewNode with tag "button"
- ViewAttribute::Static { name: "class", value: "primary" }
- ViewAttribute::EventHandler { event: "click", handler: Expr::Unary(PostIncrement, Identifier("count")) }
- ViewChild::Expression(Expr::Identifier("count"))

The view node structure mirrors the HTML structure, but with FLIN expressions embedded at every point where dynamic behavior occurs. Static attributes are strings. Dynamic attributes contain expressions. Event handlers contain expressions. Child content can be static text, nested elements, interpolated expressions, or conditional/loop constructs.

This design means the type checker can validate view expressions the same way it validates code expressions: count++ in an event handler is type-checked exactly like count++ in a code block. The code generator handles the view-specific aspects (creating DOM elements, binding handlers, setting up reactivity) while delegating expression emission to the same methods it uses for code.

The Type Representation

Every expression in the AST will eventually have a type, determined by the type checker. But the AST itself does not carry type information -- it represents the program as the developer wrote it, before any analysis. Types appear in the AST only when the developer explicitly wrote them:

rustpub enum FlinType {
    Text,
    Int,
    Float,
    Bool,
    Time,
    File,
    Money,
    Semantic(Box<FlinType>),  // semantic text
    List(Box<FlinType>),       // [text]
    Optional(Box<FlinType>),   // text?
    Entity(String),            // User, Todo
    Any,                       // Used internally
    TypeVar(u32),              // Used by type inference
}

The Semantic, List, and Optional variants are type constructors -- they take an inner type and produce a new type. semantic text is a Semantic(Box(Text)). [int] is a List(Box(Int)). text? is an Optional(Box(Text)).

The TypeVar(u32) variant does not appear in the parsed AST. It is created by the type inference algorithm (Hindley-Milner, covered in the next article) to represent unknown types that will be resolved through unification. The fact that it lives in the same enum as concrete types means the type checker can mix known and unknown types freely, without wrapper types or special cases.

Design Decisions and Trade-offs

Untyped AST vs. typed AST. Some compilers use a single AST type that accumulates type information as analysis proceeds. Others use two separate types: an untyped AST produced by the parser and a typed AST produced by the type checker. We chose the single-AST approach, where the type checker annotates the existing AST rather than producing a new one. This reduces code duplication (no parallel type hierarchies) at the cost of type safety (the type checker must trust that types are populated before the code generator reads them). For a two-person team, less code wins.

Spans on every node. Carrying a Span on every statement and expression node costs 16 bytes per node (two Position values of 12 bytes each, minus padding). For a typical FLIN program with a few hundred AST nodes, that is a few kilobytes -- completely negligible. The benefit is that every compiler error, every type error, every code generation warning can point to the exact source location. We never had to debug a "where did this error come from?" problem.

Flat children vs. recursive nesting. View children are stored as Vec<ViewChild> rather than as a recursive tree. This means a <div> with three children is a flat list, not a nested binary tree. The flat representation is easier to iterate over in the code generator and produces cleaner bytecode. It also matches how HTML developers think about their markup: a parent with a list of children, not a linked list.

No desugaring. Some compilers desugar complex syntax into simpler forms at the AST level -- for example, transforming a for loop into a while loop with an iterator. FLIN does not desugar. Each syntactic form has its own AST variant and its own code generation path. This makes the compiler more verbose but also more predictable: the bytecode generated for a for loop looks like a for loop, not like a while loop that happens to iterate.

Walking the Tree

The type checker and code generator both need to traverse the AST. Rather than implementing the visitor pattern (which is idiomatic in Java but awkward in Rust), we use simple match expressions:

rustfn emit_stmt(&mut self, stmt: &Stmt) -> Result<(), CodeGenError> {
    match stmt {
        Stmt::VarDecl { name, initializer, .. } => {
            self.emit_expr(initializer)?;
            self.define_variable(name);
            Ok(())
        }
        Stmt::EntityDecl { name, fields, .. } => {
            self.register_entity(name, fields);
            Ok(())
        }
        Stmt::If { condition, then_body, else_body, .. } => {
            self.emit_if(condition, then_body, else_body.as_deref())
        }
        Stmt::For { variable, index, iterable, body, .. } => {
            self.emit_for(variable, index.as_deref(), iterable, body)
        }
        Stmt::Save { expr, .. } => {
            self.emit_expr(expr)?;
            self.emit_opcode(OpCode::Save);
            Ok(())
        }
        // ... other variants
    }
}

Rust's exhaustive match is the key advantage here. If we add a new statement variant to the Stmt enum, the compiler forces us to handle it in every match expression that operates on statements. We cannot forget to handle a new node type -- the code will not compile until we do. This is the same principle that makes Rust's Result type safer than exceptions: the compiler tracks what you have handled and what you have not.

From AST to Bytecode: A Complete Example

To make the AST concrete, let us trace the counter example through the tree:

flincount = 0
<button click={count++}>{count}</button>

The parser produces this AST:

Program
  Stmt::VarDecl
    name: "count"
    type_annotation: None
    initializer: Expr::Literal(Integer(0))
  Stmt::ViewElement
    ViewNode
      tag: "button"
      attributes:
        EventHandler
          event: "click"
          handler: Expr::Unary(PostIncrement, Identifier("count"))
      children:
        ViewChild::Expression(Expr::Identifier("count"))

The type checker walks this tree and determines: - count has type Int (inferred from the literal 0) - The event handler expression count++ operates on an Int, producing an Int -- valid - The view interpolation {count} references an Int, which will be converted to Text for display -- valid

The code generator walks the typed tree and emits bytecode: - LoadInt0 + StoreGlobal "count" for the variable declaration - CreateElement "button" for the view element - CreateHandler "click" + LoadGlobal "count" + Dup + Incr + StoreGlobal "count" + TriggerUpdate + EndHandler + BindHandler for the event handler - LoadGlobal "count" + BindText for the text interpolation - CloseElement to finish the button - Halt to end the program

Each phase reads the AST, does its work, and passes the result to the next phase. The AST is the lingua franca of the compiler -- the common language that all phases understand.

The AST in Numbers

Metric	Count
Statement variants	9
Expression variants	17
View node types	5
Type variants	12
Total AST-related types	10 structs + 4 enums
Lines of AST definitions	~250
Unit tests for AST	10

250 lines of type definitions. That is the entire internal representation of a programming language with imperative code, declarative data, reactive views, temporal queries, and AI intent expressions. Rust's enum types do the heavy lifting: each variant is its own "class" in object-oriented terms, but with exhaustive matching and zero runtime overhead.

What Came Next

The AST is a tree of untyped nodes. It knows the structure of the program but not whether that structure makes sense. Can you add an integer to a string? Can you call a method on a boolean? Can you save an entity that has not been defined? The AST does not know. The type checker does.

The next article covers FLIN's type inference system, built on the Hindley-Milner algorithm -- the same algorithm used by Haskell and ML. It determines the type of every expression in the AST without requiring the developer to write type annotations, catches errors at compile time rather than runtime, and supports let-polymorphism so that generic data structures work correctly.

This is Part 14 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO built a programming language from scratch.

Series Navigation: - [11] Session 1: Project Setup and 42 Keywords - [12] Building a Lexer From Scratch in Rust - [13] Pratt Parsing: How FLIN Reads Your Code - [14] The Abstract Syntax Tree: FLIN's Internal Representation (you are here) - [15] Hindley-Milner Type Inference in a Custom Language