You can chunk text into pieces. You can generate embeddings from text. But connecting these two operations -- reliably, efficiently, and with proper metadata tracking -- is where most RAG implementations fall apart. The chunking module produces an array of text fragments. The embedding module accepts a string and returns a vector. Between them lies a gap: who manages the iteration, who tracks which chunk came from which document, who stores the vectors with the right keys, and who handles errors when chunk 47 of 200 fails to embed?
Session 222 built the integration layer that closes this gap. Nine new functions, 19 tests, and a complete end-to-end pipeline that takes raw document bytes and produces indexed, searchable vectors in a single function call.
The Integration Types
Three new types connect the chunking and embedding worlds:
/// Tracks where a chunk came from
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ChunkSource {
pub document_id: u64,
pub entity_type: String,
pub field_name: String,
}/// A chunk with its embedding vector attached
#[derive(Debug, Clone)]
pub struct EmbeddedChunk {
pub chunk: Chunk,
pub embedding: Vec
/// Errors specific to the chunking-embedding pipeline pub enum ChunkEmbedError { EmptyText, EmbeddingFailed(String), ExtractionFailed(String), StorageFailed(String), } ```
ChunkSource is the metadata that enables source attribution. When a semantic search returns a match, the application can trace the result back to the original document, the entity that owns it, and the field that was embedded. Without this metadata, search results are orphans -- you know the text matches, but you do not know where it came from.
EmbeddedChunk combines a chunk with its vector representation. This intermediate type exists because some operations need both the text (for display) and the vector (for search) at the same time. For instance, when displaying search results with highlighted snippets, the application needs the chunk text and the similarity score, which comes from comparing the query vector against the chunk's embedding.
Single-Chunk and Batch Embedding
The simplest integration function embeds a single chunk:
pub fn embed_chunk(chunk: &Chunk) -> Result<EmbeddedChunk, ChunkEmbedError> {
if chunk.text.is_empty() {
return Err(ChunkEmbedError::EmptyText);
}let embedding = generate_embedding(&chunk.text) .map_err(|e| ChunkEmbedError::EmbeddingFailed(e.to_string()))?;
Ok(EmbeddedChunk { chunk: chunk.clone(), embedding, source: None, }) } ```
The batch function embeds multiple chunks, which is more efficient because the embedding model can process multiple texts in a single forward pass:
pub fn embed_chunks(chunks: &[Chunk]) -> Result<Vec<EmbeddedChunk>, ChunkEmbedError> {
if chunks.is_empty() {
return Ok(vec![]);
}chunks.iter() .map(|chunk| embed_chunk(chunk)) .collect() } ```
In the current implementation, batch embedding iterates and embeds individually because the FastEmbed provider processes one text at a time. A future optimization will batch the texts into a single model invocation, reducing overhead from model loading and memory allocation.
The Combined Pipeline
The chunk_and_embed function combines chunking and embedding into a single operation:
pub fn chunk_and_embed(
text: &str,
options: Option<ChunkOptions>,
) -> Result<Vec<EmbeddedChunk>, ChunkEmbedError> {
if text.is_empty() {
return Err(ChunkEmbedError::EmptyText);
}let opts = options.unwrap_or_else(|| ChunkOptions::new(1000, 200)); let chunks = chunk_text(text, &opts); embed_chunks(&chunks) }
pub fn chunk_and_embed_default(text: &str) -> Result
This is the function that most FLIN internal code calls. Document extraction produces text. chunk_and_embed produces vectors. The caller does not need to manage the intermediate steps.
The Full Document Pipeline
The most powerful function in the integration layer takes raw document bytes and produces stored, indexed vectors:
pub fn ingest_document(
store: &mut VectorStore,
bytes: &[u8],
document_id: u64,
entity_type: &str,
field_name: &str,
mime_type: Option<&str>,
extension: Option<&str>,
chunk_options: Option<ChunkOptions>,
) -> Result<usize, ChunkEmbedError> {
// Step 1: Extract text from document
let text = extract_document(bytes, mime_type, extension)
.map_err(|e| ChunkEmbedError::ExtractionFailed(e))?;if text.trim().is_empty() { return Ok(0); }
// Step 2: Chunk the text let opts = chunk_options.unwrap_or_else(|| ChunkOptions::new(1000, 200)); let chunks = chunk_text(&text, &opts);
// Step 3: Embed each chunk let embedded = embed_chunks(&chunks)?;
// Step 4: Store in vector store store_document_embeddings(store, document_id, entity_type, field_name, &embedded)?;
Ok(embedded.len()) } ```
Five stages in one function call: extract, chunk, embed, store, and return the count. The caller provides bytes and metadata; the function handles everything else.
Storage Key Convention
When storing chunk embeddings in the vector store, the field name encodes the chunk index:
pub fn store_document_embeddings(
store: &mut VectorStore,
document_id: u64,
entity_type: &str,
field_name: &str,
chunks: &[EmbeddedChunk],
) -> Result<(), ChunkEmbedError> {
for chunk in chunks {
let chunk_field = format!("{}__chunk_{}", field_name, chunk.chunk.index);
store.store_embedding(
entity_type,
document_id,
&chunk_field,
chunk.embedding.clone(),
).map_err(|e| ChunkEmbedError::StorageFailed(e.to_string()))?;
}
Ok(())
}A document with 15 chunks produces 15 entries in the vector store:
content__chunk_0 -> [0.12, -0.34, 0.56, ...]
content__chunk_1 -> [0.23, -0.45, 0.67, ...]
content__chunk_2 -> [0.34, -0.56, 0.78, ...]
...
content__chunk_14 -> [0.89, -0.12, 0.45, ...]This naming convention allows the search system to query across all chunks of all documents in a single vector search operation. The __chunk_ separator is chosen to be unlikely to collide with user-defined field names.
Model-Specific Chunk Sizing
Different embedding models have different context windows. The integration layer includes a helper that recommends chunk sizes based on the model:
pub fn chunk_size_for_model(model: &str) -> usize {
match model {
"all-MiniLM-L6-v2" => 1000, // 512 token context
"all-MiniLM-L12-v2" => 1000, // 512 token context
"bge-small-en-v1.5" => 1500, // 512 token context, efficient
"bge-base-en-v1.5" => 2000, // 512 token context
"bge-large-en-v1.5" => 2000, // 512 token context
"nomic-embed-text-v1.5" => 6000, // 8192 token context
_ => 1000, // Conservative default
}
}Models with larger context windows can accept larger chunks, which means fewer chunks per document, fewer embeddings to store, and faster search. But larger chunks are also less precise -- a search result from a 6,000-character chunk points to a large section of text, not a specific passage. The right chunk size depends on the use case: precise retrieval favors smaller chunks; broad topic matching favors larger ones.
How FLIN Uses This Internally
When a FLIN application saves an entity with a semantic text field, the runtime triggers the full pipeline automatically:
entity LegalDocument {
title: text
content: semantic text
jurisdiction: text
}doc = LegalDocument.create({ title: "Employment Contract Template", content: document_extract(body.file), jurisdiction: "OHADA" }) save doc ```
The save operation detects that content is a semantic field. It calls chunk_and_embed on the content, stores the resulting vectors, and indexes them for search. The developer never calls a chunking function, never manages embeddings, and never interacts with the vector store directly.
This is the design goal of the integration layer: make the complex pipeline invisible. The developer declares semantic text, and the runtime handles extraction, chunking, embedding, storage, and search indexing. Nineteen tests verify that each step works correctly, from empty text handling to full end-to-end document ingestion.
Test Coverage
The 19 tests added in Session 222 cover every function and every error path:
| Test Category | Count |
|---|---|
| Error types and display | 1 |
| ChunkSource creation and serialization | 2 |
| EmbeddedChunk creation | 1 |
| Single chunk embedding | 2 |
| Batch chunk embedding | 2 |
| Combined chunk-and-embed | 3 |
| Full document processing | 3 |
| VectorStore integration | 2 |
| End-to-end ingestion | 2 |
| Model-specific sizing | 1 |
The end-to-end test is the most important: it takes raw HTML bytes, extracts text, chunks it, embeds each chunk, stores the vectors, and verifies that the vector store contains the expected number of entries with the expected field names. If any step in the pipeline breaks, this test fails.
After Session 222, the document intelligence pipeline was operational from upload to search. The remaining work in this arc focused on expanding the pipeline's input capabilities (more document formats), optimizing storage (compression and garbage collection), and adding developer-facing features (previews and auto-conversion). The next article covers extracting text from the most challenging document formats: CSV spreadsheets, Excel workbooks, RTF documents, and XML files.
---
This is Part 131 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.
Series Navigation: - [130] Text Chunking Strategies - [131] Chunk-Embedding Integration (you are here) - [132] Extracting Text From CSV, XLSX, RTF, and XML