#133 -- Semantic Auto-Conversion

Before Session 232, using semantic search in FLIN required three steps. First, declare the field. Second, call db.enable_semantic_search(). Third, call db.add_semantic_field("Entity", "field"). Miss any step and semantic search silently does nothing -- no error, no warning, just empty results from a search system that was never properly initialized.

This was a design failure. FLIN's philosophy is that common operations should require zero boilerplate. If a developer declares semantic text, they obviously want semantic search. Requiring two additional setup calls is ceremony that adds no value and creates a trap for new developers who do not read the documentation carefully enough.

Session 232 fixed this. Declaring a semantic text field now automatically enables semantic search and registers the field for embedding generation. The fix required changes to five files across the compiler pipeline -- from the type system to the bytecode emitter to the virtual machine.

The Before and After

The difference is dramatic. Before:

flinentity Product {
    description: semantic text
}

// Developer had to add these manually:
db.enable_semantic_search()
db.add_semantic_field("Product", "description")

product = Product.create({ description: "Ergonomic office chair" })
save product

results = search "comfortable seating" in Product by description limit 5

After:

flinentity Product {
    description: semantic text  // That is it. Nothing else needed.
}

product = Product.create({ description: "Ergonomic office chair" })
save product  // Automatically embedded

results = search "comfortable seating" in Product by description limit 5

One declaration. No setup calls. The runtime detects the semantic modifier during entity registration and configures everything automatically.

The Implementation Path

Making this work required changes at every level of the compiler pipeline. The semantic keyword is parsed in the frontend, carried through the type system, encoded in bytecode, and acted upon in the virtual machine.

Step 1: FieldDef Enhancement

The FieldDef struct in FlinDB needed a flag to track whether a field is semantic:

rustpub struct FieldDef {
    pub name: String,
    pub field_type: FieldType,
    pub is_nullable: bool,
    pub default_value: Option<Value>,
    pub is_semantic: bool,  // NEW: tracks semantic modifier
}

impl FieldDef {
    pub fn with_semantic(mut self) -> Self {
        self.is_semantic = true;
        self
    }
}

This flag is the signal that triggers auto-registration. When an entity schema is registered with FlinDB, the database inspects each field. If any field has is_semantic: true, the database enables semantic search for that entity and registers the field.

Step 2: FieldType Extension

The type system also needed a Semantic variant to distinguish semantic text from regular text at the type level:

rustpub enum FieldType {
    Bool,
    Int,
    Float,
    String,
    List,
    Map,
    Entity,
    Semantic,  // NEW: semantic text
    Any,
}

impl FieldType {
    pub fn type_matches(&self, value: &Value) -> bool {
        match self {
            FieldType::Semantic => matches!(value, Value::Text(_)),
            // Semantic accepts Text values -- same storage, different behavior
            _ => // ... existing matching
        }
    }
}

The Semantic field type accepts Text values. It is stored identically to regular text. The difference is behavioral: when a value is saved to a semantic field, the runtime generates an embedding and stores it in the vector index.

Step 3: New Bytecode Opcode

The existing DefineEntitySchema opcode did not carry type information. A new opcode was added:

rustpub enum ExtendedOpCode {
    // ... existing opcodes
    DefineEntitySchemaWithTypes = 0x5A,
}

// Bytecode format:
// Extended (0xF3) + DefineEntitySchemaWithTypes (0x5A)
//   + entity_name_idx (u16 LE)
//   + field_count (u8)
//   + For each field:
//       + field_name_idx (u16 LE)
//       + field_type (u8): 0=Bool, 1=Int, 2=Float, 3=String, ...
//       + is_semantic (u8): 0 or 1

The is_semantic byte is the critical addition. Each field in the schema carries a single byte that tells the VM whether this field should trigger embedding generation.

Step 4: Bytecode Emission

The emitter generates the new opcode when compiling entity declarations:

rustfn emit_entity_schema_with_types(
    &mut self,
    entity_name: &str,
    fields: &[FieldDeclaration],
) {
    self.emit_extended(ExtendedOpCode::DefineEntitySchemaWithTypes);
    let name_idx = self.intern_string(entity_name);
    self.emit_u16(name_idx);
    self.emit_u8(fields.len() as u8);

    for field in fields {
        let field_idx = self.intern_string(&field.name);
        self.emit_u16(field_idx);
        self.emit_u8(type_to_byte(&field.field_type));
        self.emit_u8(if field.is_semantic { 1 } else { 0 });
    }
}

fn type_to_byte(ty: &Type) -> u8 {
    match ty {
        Type::Bool => 0,
        Type::Int => 1,
        Type::Float => 2,
        Type::String | Type::Semantic(_) => 3,
        Type::List(_) => 4,
        Type::Map(_, _) => 5,
        Type::Entity(_) => 6,
        _ => 7, // Any
    }
}

Step 5: VM Handler

The VM handles the new opcode by building a typed entity schema and registering it with FlinDB:

rustExtendedOpCode::DefineEntitySchemaWithTypes => {
    let name_idx = self.read_u16();
    let entity_name = self.get_string(name_idx);
    let field_count = self.read_u8();

    let mut schema = EntitySchema::new(&entity_name);
    for _ in 0..field_count {
        let field_name_idx = self.read_u16();
        let field_name = self.get_string(field_name_idx);
        let field_type = byte_to_field_type(self.read_u8());
        let is_semantic = self.read_u8() == 1;

        let mut field_def = FieldDef::new(&field_name, field_type);
        if is_semantic {
            field_def = field_def.with_semantic();
        }
        schema.add_field(field_def);
    }

    self.db.register_entity(schema);
}

Step 6: Auto-Registration in FlinDB

The registration function inspects the schema and automatically configures semantic search:

rustpub fn register_entity(&mut self, schema: EntitySchema) {
    let entity_name = schema.name.clone();

    // Check for semantic fields
    let semantic_fields: Vec<String> = schema.fields.iter()
        .filter(|f| f.is_semantic || matches!(f.field_type, FieldType::Semantic))
        .map(|f| f.name.clone())
        .collect();

    // Register the schema
    self.schemas.insert(entity_name.clone(), schema);

    // Auto-enable semantic search if any semantic fields exist
    if !semantic_fields.is_empty() {
        self.ensure_semantic_enabled();
        for field_name in semantic_fields {
            self.add_semantic_field(&entity_name, &field_name);
        }
    }
}

fn ensure_semantic_enabled(&mut self) {
    if self.semantic_search.is_none() {
        self.semantic_search = Some(SemanticSearchConfig::default());
    }
}

The ensure_semantic_enabled function is idempotent. Calling it multiple times has no effect after the first call. This is important because multiple entities can have semantic fields, and each entity registration triggers the check.

Type Coercion

The type checker also needed an update. When a regular text value is assigned to a semantic text field, the type checker must allow the coercion:

rustfn types_compatible(expected: &Type, actual: &Type) -> bool {
    match (expected, actual) {
        // Text is compatible with Semantic(Text)
        (Type::Semantic(inner), actual) if **inner == Type::Text => {
            types_compatible(&Type::Text, actual)
        }
        (expected, Type::Semantic(inner)) if **inner == Type::Text => {
            types_compatible(expected, &Type::Text)
        }
        // ... existing compatibility rules
    }
}

This coercion means that a semantic text field accepts regular string values without explicit casting. The semantic behavior is attached to the field, not to the value. A string "Ergonomic office chair" is just a string; it becomes semantic when stored in a semantic field.

Edge Cases

The implementation handles several edge cases that might not be obvious:

Empty text. An empty string is still indexed. The embedding model generates a vector for the empty string, which has well-defined similarity properties. This ensures that a semantic field with an empty value does not cause errors during search.

Optional semantic fields. A field declared as semantic text? generates an embedding only when the value is non-none. If the field is none, no embedding is stored, and the entity is excluded from semantic search on that field.

Schema updates. If an entity schema is re-registered (for example, after a hot reload during development), the semantic configuration is preserved. Re-registration does not duplicate semantic field entries or reset the vector index.

Backward compatibility. The manual db.add_semantic_field() API still works. Applications that use it explicitly continue to function. Auto-registration supplements manual configuration; it does not replace it.

What Changed for Developers

The practical impact is that semantic search in FLIN now requires exactly one annotation:

flinentity Article {
    title: text
    content: semantic text    // This is all you need
    author: text
}

// Save an article -- content is automatically embedded
article = Article.create({
    title: "Building in Abidjan",
    content: long_article_text,
    author: "Thales"
})
save article

// Search works immediately
results = search "startup ecosystem West Africa" in Article by content limit 5

No setup. No initialization. No configuration file. One keyword in a field declaration activates an entire pipeline: embedding generation on save, vector storage, index maintenance, and semantic search at query time.

This is what FLIN means by "replacing 47 technologies." In a traditional stack, enabling semantic search requires installing an embedding model, setting up a vector database, writing an indexing pipeline, configuring a search endpoint, and wiring everything together. In FLIN, it is one word: semantic.

Eight tests verify the auto-conversion behavior, including idempotent enabling, multiple semantic fields on one entity, mixed entities (some with semantic fields, some without), and backward compatibility with manual registration. The total test count after Session 232: 3,452.

In the next article, we turn from intelligence to efficiency. Zstd compression reduces storage costs, and garbage collection reclaims space from deleted files -- the two optimizations that make FLIN's file storage production-ready at scale.

This is Part 133 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [132] Extracting Text From CSV, XLSX, RTF, and XML - [133] Semantic Auto-Conversion (you are here) - [134] Zstd Compression and Blob Garbage Collection