#123 -- Hybrid Document Search: BM25 + Semantic

Semantic search understands meaning but misses exact terms. If a user searches for "error 504" in a knowledge base, semantic search might return articles about gateway timeouts and network errors (correct meaning) but rank them below articles about HTTP errors in general (broader meaning). The exact term "504" is not captured in the embedding -- it is a specific number, not a concept.

BM25 keyword search matches exact terms but misses meaning. Searching for "how to fix slow database" with BM25 finds articles containing those exact words but misses an article titled "Optimizing Query Performance in PostgreSQL" -- which is exactly what the user needs but does not contain the words "fix," "slow," or "database."

Hybrid search combines both approaches: BM25 for precision on exact terms, semantic search for recall on related concepts. FLIN implements this as a built-in hybrid_search() function that merges results from both search methods using Reciprocal Rank Fusion.

The Hybrid Search Function

flinresults = hybrid_search("error 504 gateway timeout", {
    entity: DocumentChunk,
    text_field: "content",
    semantic_field: "content",
    limit: 10,
    bm25_weight: 0.4,
    semantic_weight: 0.6
})

The function performs two searches in parallel: 1. BM25 search on the text_field for keyword matching. 2. Semantic search on the semantic_field for meaning matching.

The results are merged using weighted Reciprocal Rank Fusion (RRF):

rustpub fn reciprocal_rank_fusion(
    bm25_results: &[SearchResult],
    semantic_results: &[SearchResult],
    bm25_weight: f32,
    semantic_weight: f32,
    k: f32, // RRF constant, typically 60
) -> Vec<SearchResult> {
    let mut scores: HashMap<EntityId, f32> = HashMap::new();

    // Score BM25 results
    for (rank, result) in bm25_results.iter().enumerate() {
        let rrf_score = bm25_weight / (k + rank as f32 + 1.0);
        *scores.entry(result.id).or_insert(0.0) += rrf_score;
    }

    // Score semantic results
    for (rank, result) in semantic_results.iter().enumerate() {
        let rrf_score = semantic_weight / (k + rank as f32 + 1.0);
        *scores.entry(result.id).or_insert(0.0) += rrf_score;
    }

    // Sort by combined score
    let mut merged: Vec<_> = scores.into_iter()
        .map(|(id, score)| SearchResult { id, score })
        .collect();
    merged.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap());

    merged
}

Why Reciprocal Rank Fusion

RRF is preferred over simple score averaging for a critical reason: BM25 scores and cosine similarity scores are on completely different scales. A BM25 score of 15.7 and a cosine similarity of 0.89 cannot be meaningfully averaged. RRF normalizes both rankings to a common scale by using rank positions rather than raw scores.

A document ranked #1 in BM25 and #3 in semantic search gets: - BM25 RRF: 0.4 / (60 + 1) = 0.00656 - Semantic RRF: 0.6 / (60 + 3) = 0.00952 - Total: 0.01608

A document ranked #10 in BM25 and #1 in semantic search gets: - BM25 RRF: 0.4 / (60 + 10) = 0.00571 - Semantic RRF: 0.6 / (60 + 1) = 0.00984 - Total: 0.01555

The document that ranks well in both methods scores highest. A document that ranks first in one method but is absent from the other still receives a reasonable score.

BM25 Implementation

BM25 (Best Matching 25) is a probabilistic ranking function that scores documents based on term frequency, inverse document frequency, and document length normalization:

rustpub struct Bm25Index {
    // Inverted index: term -> [(doc_id, term_frequency)]
    inverted: HashMap<String, Vec<(EntityId, u32)>>,
    // Document lengths
    doc_lengths: HashMap<EntityId, u32>,
    // Average document length
    avg_dl: f32,
    // Total number of documents
    num_docs: u32,
    // BM25 parameters
    k1: f32,    // Term frequency saturation (default: 1.2)
    b: f32,     // Length normalization (default: 0.75)
}

impl Bm25Index {
    pub fn search(&self, query: &str, limit: usize) -> Vec<(EntityId, f32)> {
        let terms = tokenize(query);
        let mut scores: HashMap<EntityId, f32> = HashMap::new();

        for term in &terms {
            if let Some(postings) = self.inverted.get(term) {
                let df = postings.len() as f32;
                let idf = ((self.num_docs as f32 - df + 0.5) / (df + 0.5) + 1.0).ln();

                for (doc_id, tf) in postings {
                    let dl = self.doc_lengths[doc_id] as f32;
                    let tf_norm = (*tf as f32 * (self.k1 + 1.0))
                        / (*tf as f32 + self.k1 * (1.0 - self.b + self.b * dl / self.avg_dl));

                    *scores.entry(*doc_id).or_insert(0.0) += idf * tf_norm;
                }
            }
        }

        let mut results: Vec<_> = scores.into_iter().collect();
        results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
        results.truncate(limit);
        results
    }
}

The BM25 index is maintained automatically alongside the HNSW semantic index. When a semantic text field is saved, both the BM25 inverted index and the HNSW vector index are updated.

When Hybrid Search Wins

Hybrid search consistently outperforms either method alone in three scenarios:

Exact Term Queries

User searches for "error ERR_CONNECTION_REFUSED": - BM25 alone: Finds documents containing that exact error code. Score: High. - Semantic alone: Finds documents about connection errors in general. May miss the exact code. Score: Medium. - Hybrid: BM25 identifies the exact match; semantic adds related troubleshooting articles. Score: Highest.

Conceptual Queries

User searches for "how to make my app faster": - BM25 alone: Finds documents containing "faster" and "app." Misses articles about "performance optimization." Score: Low. - Semantic alone: Finds articles about performance, caching, and optimization. Score: High. - Hybrid: Semantic provides the primary results; BM25 boosts any that also contain exact terms. Score: Highest.

Mixed Queries

User searches for "configure nginx reverse proxy for websockets": - BM25 alone: Good for "nginx" and "websockets" (specific terms). Misses synonymous configurations. - Semantic alone: Good for "reverse proxy" concept. May miss nginx-specific articles. - Hybrid: Both methods contribute. Documents mentioning "nginx" AND discussing proxy concepts score highest. Score: Highest.

Using Hybrid Search in Practice

Knowledge Base Search

flin// app/api/search.flin

guard auth

route GET {
    q = query.q || ""
    if q.len < 2 {
        return error(400, "Query too short")
    }

    results = hybrid_search(q, {
        entity: DocumentChunk,
        text_field: "content",
        semantic_field: "content",
        limit: 20,
        bm25_weight: 0.3,
        semantic_weight: 0.7
    })

    // Group chunks by document
    doc_ids = results.map(r => r.document_id).unique
    documents = doc_ids.map(id => {
        doc = Document.find(id)
        relevant_chunks = results.where(document_id == id)
        {
            id: doc.id,
            title: doc.title,
            score: relevant_chunks[0].score,
            preview: relevant_chunks[0].content.slice(0, 200) + "...",
            chunk_count: relevant_chunks.len
        }
    })

    { query: q, results: documents, total: documents.len }
}

E-Commerce Product Search

flinresults = hybrid_search(search_query, {
    entity: Product,
    text_field: "name",
    semantic_field: "description",
    limit: 20,
    bm25_weight: 0.5,
    semantic_weight: 0.5
})

For e-commerce, a 50/50 weight works well because users often search for specific product names (BM25 advantage) and general descriptions (semantic advantage).

Tuning the Weights

The bm25_weight and semantic_weight parameters control the balance between methods. Optimal weights depend on the application:

Application	BM25 Weight	Semantic Weight	Rationale
Code search	0.6	0.4	Exact identifiers matter
Documentation	0.3	0.7	Concepts matter more than exact words
E-commerce	0.5	0.5	Both product names and descriptions
Legal search	0.4	0.6	Concepts with specific terms
Support tickets	0.3	0.7	Users describe problems, not use keywords

The weights can be tuned empirically by running a set of test queries and measuring which weight combination produces the most relevant results.

Performance

Component	Latency	Notes
BM25 search (100K docs)	1-3 ms	Inverted index lookup
Semantic search (100K docs)	3-5 ms	HNSW approximate nearest neighbor
RRF merge	< 1 ms	Score computation
Total hybrid search	5-10 ms	Both searches run in parallel

Both searches run concurrently using Rust's async runtime. The total latency is approximately the maximum of the two individual latencies, not the sum.

Index Maintenance

Both indices (BM25 inverted index and HNSW vector index) are updated automatically when entities are saved, updated, or deleted:

flin// This single save updates both indices
save DocumentChunk {
    document_id: doc.id,
    content: "New chunk content..."    // semantic text
}

The BM25 index tokenizes the content and updates the inverted index. The HNSW index generates an embedding and inserts it into the vector index. Both operations are part of the same save transaction.

Hybrid search gives FLIN applications the best of both worlds: the precision of keyword matching for specific terms and the intelligence of semantic understanding for conceptual queries. It is the search method used internally by the knowledge base example in the RAG article, and it is available to every FLIN application through a single function call.

In the next article, we step back from specific features to examine FLIN's AI-first language design -- the philosophical and practical decisions that make FLIN uniquely suited for AI-assisted development.

This is Part 123 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [122] Code-Aware Chunking for RAG - [123] Hybrid Document Search: BM25 + Semantic (you are here) - [124] AI-First Language Design - [125] Search Analytics and Result Caching