#131 -- Chunk-Embedding Integration

You can chunk text into pieces. You can generate embeddings from text. But connecting these two operations -- reliably, efficiently, and with proper metadata tracking -- is where most RAG implementations fall apart. The chunking module produces an array of text fragments. The embedding module accepts a string and returns a vector. Between them lies a gap: who manages the iteration, who tracks which chunk came from which document, who stores the vectors with the right keys, and who handles errors when chunk 47 of 200 fails to embed?

Session 222 built the integration layer that closes this gap. Nine new functions, 19 tests, and a complete end-to-end pipeline that takes raw document bytes and produces indexed, searchable vectors in a single function call.

The Integration Types

Three new types connect the chunking and embedding worlds:

rust/// Tracks where a chunk came from
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ChunkSource {
    pub document_id: u64,
    pub entity_type: String,
    pub field_name: String,
}

/// A chunk with its embedding vector attached
#[derive(Debug, Clone)]
pub struct EmbeddedChunk {
    pub chunk: Chunk,
    pub embedding: Vec<f32>,
    pub source: Option<ChunkSource>,
}

/// Errors specific to the chunking-embedding pipeline
pub enum ChunkEmbedError {
    EmptyText,
    EmbeddingFailed(String),
    ExtractionFailed(String),
    StorageFailed(String),
}

ChunkSource is the metadata that enables source attribution. When a semantic search returns a match, the application can trace the result back to the original document, the entity that owns it, and the field that was embedded. Without this metadata, search results are orphans -- you know the text matches, but you do not know where it came from.

EmbeddedChunk combines a chunk with its vector representation. This intermediate type exists because some operations need both the text (for display) and the vector (for search) at the same time. For instance, when displaying search results with highlighted snippets, the application needs the chunk text and the similarity score, which comes from comparing the query vector against the chunk's embedding.

Single-Chunk and Batch Embedding

The simplest integration function embeds a single chunk:

rustpub fn embed_chunk(chunk: &Chunk) -> Result<EmbeddedChunk, ChunkEmbedError> {
    if chunk.text.is_empty() {
        return Err(ChunkEmbedError::EmptyText);
    }

    let embedding = generate_embedding(&chunk.text)
        .map_err(|e| ChunkEmbedError::EmbeddingFailed(e.to_string()))?;

    Ok(EmbeddedChunk {
        chunk: chunk.clone(),
        embedding,
        source: None,
    })
}

The batch function embeds multiple chunks, which is more efficient because the embedding model can process multiple texts in a single forward pass:

rustpub fn embed_chunks(chunks: &[Chunk]) -> Result<Vec<EmbeddedChunk>, ChunkEmbedError> {
    if chunks.is_empty() {
        return Ok(vec![]);
    }

    chunks.iter()
        .map(|chunk| embed_chunk(chunk))
        .collect()
}

In the current implementation, batch embedding iterates and embeds individually because the FastEmbed provider processes one text at a time. A future optimization will batch the texts into a single model invocation, reducing overhead from model loading and memory allocation.

The Combined Pipeline

The chunk_and_embed function combines chunking and embedding into a single operation:

rustpub fn chunk_and_embed(
    text: &str,
    options: Option<ChunkOptions>,
) -> Result<Vec<EmbeddedChunk>, ChunkEmbedError> {
    if text.is_empty() {
        return Err(ChunkEmbedError::EmptyText);
    }

    let opts = options.unwrap_or_else(|| ChunkOptions::new(1000, 200));
    let chunks = chunk_text(text, &opts);
    embed_chunks(&chunks)
}

pub fn chunk_and_embed_default(text: &str) -> Result<Vec<EmbeddedChunk>, ChunkEmbedError> {
    chunk_and_embed(text, None)
}

This is the function that most FLIN internal code calls. Document extraction produces text. chunk_and_embed produces vectors. The caller does not need to manage the intermediate steps.

The Full Document Pipeline

The most powerful function in the integration layer takes raw document bytes and produces stored, indexed vectors:

rustpub fn ingest_document(
    store: &mut VectorStore,
    bytes: &[u8],
    document_id: u64,
    entity_type: &str,
    field_name: &str,
    mime_type: Option<&str>,
    extension: Option<&str>,
    chunk_options: Option<ChunkOptions>,
) -> Result<usize, ChunkEmbedError> {
    // Step 1: Extract text from document
    let text = extract_document(bytes, mime_type, extension)
        .map_err(|e| ChunkEmbedError::ExtractionFailed(e))?;

    if text.trim().is_empty() {
        return Ok(0);
    }

    // Step 2: Chunk the text
    let opts = chunk_options.unwrap_or_else(|| ChunkOptions::new(1000, 200));
    let chunks = chunk_text(&text, &opts);

    // Step 3: Embed each chunk
    let embedded = embed_chunks(&chunks)?;

    // Step 4: Store in vector store
    store_document_embeddings(store, document_id, entity_type, field_name, &embedded)?;

    Ok(embedded.len())
}

Five stages in one function call: extract, chunk, embed, store, and return the count. The caller provides bytes and metadata; the function handles everything else.

Storage Key Convention

When storing chunk embeddings in the vector store, the field name encodes the chunk index:

rustpub fn store_document_embeddings(
    store: &mut VectorStore,
    document_id: u64,
    entity_type: &str,
    field_name: &str,
    chunks: &[EmbeddedChunk],
) -> Result<(), ChunkEmbedError> {
    for chunk in chunks {
        let chunk_field = format!("{}__chunk_{}", field_name, chunk.chunk.index);
        store.store_embedding(
            entity_type,
            document_id,
            &chunk_field,
            chunk.embedding.clone(),
        ).map_err(|e| ChunkEmbedError::StorageFailed(e.to_string()))?;
    }
    Ok(())
}

A document with 15 chunks produces 15 entries in the vector store:

content__chunk_0  -> [0.12, -0.34, 0.56, ...]
content__chunk_1  -> [0.23, -0.45, 0.67, ...]
content__chunk_2  -> [0.34, -0.56, 0.78, ...]
...
content__chunk_14 -> [0.89, -0.12, 0.45, ...]

This naming convention allows the search system to query across all chunks of all documents in a single vector search operation. The __chunk_ separator is chosen to be unlikely to collide with user-defined field names.

Model-Specific Chunk Sizing

Different embedding models have different context windows. The integration layer includes a helper that recommends chunk sizes based on the model:

rustpub fn chunk_size_for_model(model: &str) -> usize {
    match model {
        "all-MiniLM-L6-v2" => 1000,        // 512 token context
        "all-MiniLM-L12-v2" => 1000,       // 512 token context
        "bge-small-en-v1.5" => 1500,       // 512 token context, efficient
        "bge-base-en-v1.5" => 2000,        // 512 token context
        "bge-large-en-v1.5" => 2000,       // 512 token context
        "nomic-embed-text-v1.5" => 6000,   // 8192 token context
        _ => 1000,                          // Conservative default
    }
}

Models with larger context windows can accept larger chunks, which means fewer chunks per document, fewer embeddings to store, and faster search. But larger chunks are also less precise -- a search result from a 6,000-character chunk points to a large section of text, not a specific passage. The right chunk size depends on the use case: precise retrieval favors smaller chunks; broad topic matching favors larger ones.

How FLIN Uses This Internally

When a FLIN application saves an entity with a semantic text field, the runtime triggers the full pipeline automatically:

flinentity LegalDocument {
    title: text
    content: semantic text
    jurisdiction: text
}

doc = LegalDocument.create({
    title: "Employment Contract Template",
    content: document_extract(body.file),
    jurisdiction: "OHADA"
})
save doc

The save operation detects that content is a semantic field. It calls chunk_and_embed on the content, stores the resulting vectors, and indexes them for search. The developer never calls a chunking function, never manages embeddings, and never interacts with the vector store directly.

This is the design goal of the integration layer: make the complex pipeline invisible. The developer declares semantic text, and the runtime handles extraction, chunking, embedding, storage, and search indexing. Nineteen tests verify that each step works correctly, from empty text handling to full end-to-end document ingestion.

Test Coverage

The 19 tests added in Session 222 cover every function and every error path:

Test Category	Count
Error types and display	1
ChunkSource creation and serialization	2
EmbeddedChunk creation	1
Single chunk embedding	2
Batch chunk embedding	2
Combined chunk-and-embed	3
Full document processing	3
VectorStore integration	2
End-to-end ingestion	2
Model-specific sizing	1

The end-to-end test is the most important: it takes raw HTML bytes, extracts text, chunks it, embeds each chunk, stores the vectors, and verifies that the vector store contains the expected number of entries with the expected field names. If any step in the pipeline breaks, this test fails.

After Session 222, the document intelligence pipeline was operational from upload to search. The remaining work in this arc focused on expanding the pipeline's input capabilities (more document formats), optimizing storage (compression and garbage collection), and adding developer-facing features (previews and auto-conversion). The next article covers extracting text from the most challenging document formats: CSV spreadsheets, Excel workbooks, RTF documents, and XML files.

This is Part 131 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [130] Text Chunking Strategies - [131] Chunk-Embedding Integration (you are here) - [132] Extracting Text From CSV, XLSX, RTF, and XML

#131 -- Chunk-Embedding Integration

The Integration Types

Single-Chunk and Batch Embedding

The Combined Pipeline

The Full Document Pipeline

Storage Key Convention

Model-Specific Chunk Sizing

How FLIN Uses This Internally

Test Coverage

Responses

Related Articles

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

The gate caught its own drift: one day inside CASP with Claude Fable 5

The CASP Transplant: How The Six-File Discipline Moved From Conductor To An Anti-Fraud Transport ERP, What The /next Skill Adds When The Operator Just Types 'next', And Why The Cost Of CASP Drift Rises When The Project Is Someone Else's Cash