Semantic search understands meaning but misses exact terms. If a user searches for "error 504" in a knowledge base, semantic search might return articles about gateway timeouts and network errors (correct meaning) but rank them below articles about HTTP errors in general (broader meaning). The exact term "504" is not captured in the embedding -- it is a specific number, not a concept.
BM25 keyword search matches exact terms but misses meaning. Searching for "how to fix slow database" with BM25 finds articles containing those exact words but misses an article titled "Optimizing Query Performance in PostgreSQL" -- which is exactly what the user needs but does not contain the words "fix," "slow," or "database."
Hybrid search combines both approaches: BM25 for precision on exact terms, semantic search for recall on related concepts. FLIN implements this as a built-in hybrid_search() function that merges results from both search methods using Reciprocal Rank Fusion.
The Hybrid Search Function
results = hybrid_search("error 504 gateway timeout", {
entity: DocumentChunk,
text_field: "content",
semantic_field: "content",
limit: 10,
bm25_weight: 0.4,
semantic_weight: 0.6
})The function performs two searches in parallel:
1. BM25 search on the text_field for keyword matching.
2. Semantic search on the semantic_field for meaning matching.
The results are merged using weighted Reciprocal Rank Fusion (RRF):
pub fn reciprocal_rank_fusion(
bm25_results: &[SearchResult],
semantic_results: &[SearchResult],
bm25_weight: f32,
semantic_weight: f32,
k: f32, // RRF constant, typically 60
) -> Vec<SearchResult> {
let mut scores: HashMap<EntityId, f32> = HashMap::new();// Score BM25 results for (rank, result) in bm25_results.iter().enumerate() { let rrf_score = bm25_weight / (k + rank as f32 + 1.0); *scores.entry(result.id).or_insert(0.0) += rrf_score; }
// Score semantic results for (rank, result) in semantic_results.iter().enumerate() { let rrf_score = semantic_weight / (k + rank as f32 + 1.0); *scores.entry(result.id).or_insert(0.0) += rrf_score; }
// Sort by combined score let mut merged: Vec<_> = scores.into_iter() .map(|(id, score)| SearchResult { id, score }) .collect(); merged.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap());
merged } ```
Why Reciprocal Rank Fusion
RRF is preferred over simple score averaging for a critical reason: BM25 scores and cosine similarity scores are on completely different scales. A BM25 score of 15.7 and a cosine similarity of 0.89 cannot be meaningfully averaged. RRF normalizes both rankings to a common scale by using rank positions rather than raw scores.
A document ranked #1 in BM25 and #3 in semantic search gets: - BM25 RRF: 0.4 / (60 + 1) = 0.00656 - Semantic RRF: 0.6 / (60 + 3) = 0.00952 - Total: 0.01608
A document ranked #10 in BM25 and #1 in semantic search gets: - BM25 RRF: 0.4 / (60 + 10) = 0.00571 - Semantic RRF: 0.6 / (60 + 1) = 0.00984 - Total: 0.01555
The document that ranks well in both methods scores highest. A document that ranks first in one method but is absent from the other still receives a reasonable score.
BM25 Implementation
BM25 (Best Matching 25) is a probabilistic ranking function that scores documents based on term frequency, inverse document frequency, and document length normalization:
pub struct Bm25Index {
// Inverted index: term -> [(doc_id, term_frequency)]
inverted: HashMap<String, Vec<(EntityId, u32)>>,
// Document lengths
doc_lengths: HashMap<EntityId, u32>,
// Average document length
avg_dl: f32,
// Total number of documents
num_docs: u32,
// BM25 parameters
k1: f32, // Term frequency saturation (default: 1.2)
b: f32, // Length normalization (default: 0.75)
}impl Bm25Index {
pub fn search(&self, query: &str, limit: usize) -> Vec<(EntityId, f32)> {
let terms = tokenize(query);
let mut scores: HashMap
for term in &terms { if let Some(postings) = self.inverted.get(term) { let df = postings.len() as f32; let idf = ((self.num_docs as f32 - df + 0.5) / (df + 0.5) + 1.0).ln();
for (doc_id, tf) in postings { let dl = self.doc_lengths[doc_id] as f32; let tf_norm = (tf as f32 (self.k1 + 1.0)) / (tf as f32 + self.k1 (1.0 - self.b + self.b * dl / self.avg_dl));
scores.entry(doc_id).or_insert(0.0) += idf * tf_norm; } } }
let mut results: Vec<_> = scores.into_iter().collect(); results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap()); results.truncate(limit); results } } ```
The BM25 index is maintained automatically alongside the HNSW semantic index. When a semantic text field is saved, both the BM25 inverted index and the HNSW vector index are updated.
When Hybrid Search Wins
Hybrid search consistently outperforms either method alone in three scenarios:
Exact Term Queries
User searches for "error ERR_CONNECTION_REFUSED": - BM25 alone: Finds documents containing that exact error code. Score: High. - Semantic alone: Finds documents about connection errors in general. May miss the exact code. Score: Medium. - Hybrid: BM25 identifies the exact match; semantic adds related troubleshooting articles. Score: Highest.
Conceptual Queries
User searches for "how to make my app faster": - BM25 alone: Finds documents containing "faster" and "app." Misses articles about "performance optimization." Score: Low. - Semantic alone: Finds articles about performance, caching, and optimization. Score: High. - Hybrid: Semantic provides the primary results; BM25 boosts any that also contain exact terms. Score: Highest.
Mixed Queries
User searches for "configure nginx reverse proxy for websockets": - BM25 alone: Good for "nginx" and "websockets" (specific terms). Misses synonymous configurations. - Semantic alone: Good for "reverse proxy" concept. May miss nginx-specific articles. - Hybrid: Both methods contribute. Documents mentioning "nginx" AND discussing proxy concepts score highest. Score: Highest.
Using Hybrid Search in Practice
Knowledge Base Search
// app/api/search.flinguard auth
route GET { q = query.q || "" if q.len < 2 { return error(400, "Query too short") }
results = hybrid_search(q, { entity: DocumentChunk, text_field: "content", semantic_field: "content", limit: 20, bm25_weight: 0.3, semantic_weight: 0.7 })
// Group chunks by document doc_ids = results.map(r => r.document_id).unique documents = doc_ids.map(id => { doc = Document.find(id) relevant_chunks = results.where(document_id == id) { id: doc.id, title: doc.title, score: relevant_chunks[0].score, preview: relevant_chunks[0].content.slice(0, 200) + "...", chunk_count: relevant_chunks.len } })
{ query: q, results: documents, total: documents.len } } ```
E-Commerce Product Search
results = hybrid_search(search_query, {
entity: Product,
text_field: "name",
semantic_field: "description",
limit: 20,
bm25_weight: 0.5,
semantic_weight: 0.5
})For e-commerce, a 50/50 weight works well because users often search for specific product names (BM25 advantage) and general descriptions (semantic advantage).
Tuning the Weights
The bm25_weight and semantic_weight parameters control the balance between methods. Optimal weights depend on the application:
| Application | BM25 Weight | Semantic Weight | Rationale |
|---|---|---|---|
| Code search | 0.6 | 0.4 | Exact identifiers matter |
| Documentation | 0.3 | 0.7 | Concepts matter more than exact words |
| E-commerce | 0.5 | 0.5 | Both product names and descriptions |
| Legal search | 0.4 | 0.6 | Concepts with specific terms |
| Support tickets | 0.3 | 0.7 | Users describe problems, not use keywords |
The weights can be tuned empirically by running a set of test queries and measuring which weight combination produces the most relevant results.
Performance
| Component | Latency | Notes |
|---|---|---|
| BM25 search (100K docs) | 1-3 ms | Inverted index lookup |
| Semantic search (100K docs) | 3-5 ms | HNSW approximate nearest neighbor |
| RRF merge | < 1 ms | Score computation |
| Total hybrid search | 5-10 ms | Both searches run in parallel |
Both searches run concurrently using Rust's async runtime. The total latency is approximately the maximum of the two individual latencies, not the sum.
Index Maintenance
Both indices (BM25 inverted index and HNSW vector index) are updated automatically when entities are saved, updated, or deleted:
// This single save updates both indices
save DocumentChunk {
document_id: doc.id,
content: "New chunk content..." // semantic text
}The BM25 index tokenizes the content and updates the inverted index. The HNSW index generates an embedding and inserts it into the vector index. Both operations are part of the same save transaction.
Hybrid search gives FLIN applications the best of both worlds: the precision of keyword matching for specific terms and the intelligence of semantic understanding for conceptual queries. It is the search method used internally by the knowledge base example in the RAG article, and it is available to every FLIN application through a single function call.
In the next article, we step back from specific features to examine FLIN's AI-first language design -- the philosophical and practical decisions that make FLIN uniquely suited for AI-assisted development.
---
This is Part 123 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.
Series Navigation: - [122] Code-Aware Chunking for RAG - [123] Hybrid Document Search: BM25 + Semantic (you are here) - [124] AI-First Language Design - [125] Search Analytics and Result Caching