The AI landscape in 2026 is fragmented. OpenAI has GPT-4o. Anthropic has Claude. Google has Gemini. Mistral has their open-weight models. Cohere specializes in embeddings. Groq offers inference at extraordinary speed. DeepInfra hosts dozens of open-source models. Each provider has its own API format, its own authentication scheme, its own pricing model, and its own SDK.
A FLIN application that uses AI should not be locked into a single provider. If OpenAI raises prices, you should be able to switch to DeepInfra. If Anthropic adds a feature you need, you should be able to try it without rewriting your code. If you want to run locally for privacy, you should be able to use a local model.
FLIN's AI Gateway provides a unified interface to eight providers. Your FLIN code calls ai_complete(), ai_embed(), and ai_chat(). The gateway routes the request to the configured provider, translates the API format, and returns a normalized response. Switching providers is one line in flin.config.
The Unified API
Three functions cover the most common AI operations:
// Text completion
response = ai_complete("Summarize this article: " + article.content, {
max_tokens: 200,
temperature: 0.3
})// Chat completion response = ai_chat([ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: user_message } ])
// Embeddings vector = ai_embed("comfortable office chair for long work sessions") ```
These functions work regardless of which provider is configured. The API is the same whether you are using GPT-4o, Claude, Gemini, or a local Llama model.
Provider Configuration
The AI provider is configured in flin.config:
// flin.config
ai {
provider = "openai"
model = "gpt-4o-mini"
embedding_model = "text-embedding-3-small"
api_key = env("OPENAI_API_KEY")
}Switching to another provider:
// Anthropic
ai {
provider = "anthropic"
model = "claude-3-haiku"
api_key = env("ANTHROPIC_API_KEY")
}// DeepInfra ai { provider = "deepinfra" model = "meta-llama/Meta-Llama-3-8B-Instruct" api_key = env("DEEPINFRA_API_KEY") }
// Groq (for speed) ai { provider = "groq" model = "llama3-70b-8192" api_key = env("GROQ_API_KEY") }
// Local (no API key needed) ai { provider = "local" model = "llama3" endpoint = "http://localhost:11434" } ```
The application code does not change. The same ai_complete() call works with every provider.
The Eight Supported Providers
| Provider | Models | Best For |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o Mini | General-purpose, vision |
| Anthropic | Claude 3 Opus, Sonnet, Haiku | Long context, reasoning |
| Gemini Pro, Gemini Flash | Multimodal, speed | |
| Mistral | Mistral Large, Medium, Small | European data residency |
| Cohere | Command R+, Embed v3 | Embeddings, RAG |
| Groq | Llama 3, Mixtral | Ultra-low latency |
| DeepInfra | 50+ open models | Cost optimization |
| Local | Ollama, llama.cpp | Privacy, offline |
Gateway Implementation
The gateway translates between FLIN's unified format and each provider's specific API:
pub struct AiGateway {
provider: Box<dyn AiProvider>,
config: AiConfig,
}pub trait AiProvider: Send + Sync {
async fn complete(&self, prompt: &str, opts: &CompletionOptions) -> Result
impl AiGateway {
pub fn new(config: &AiConfig) -> Result
Ok(Self { provider, config }) } } ```
Each provider implementation translates the unified request format to the provider's specific API:
pub struct OpenAiProvider {
api_key: String,
model: String,
base_url: String,
}impl AiProvider for OpenAiProvider {
async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result
let response = reqwest::Client::new() .post(format!("{}/chat/completions", self.base_url)) .bearer_auth(&self.api_key) .json(&body) .send() .await?;
let data: OpenAiResponse = response.json().await?; Ok(data.choices[0].message.content.clone()) } } ```
The Anthropic provider translates to Anthropic's format (which uses system as a separate parameter, not a message):
impl AiProvider for AnthropicProvider {
async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError> {
let system = messages.iter()
.find(|m| m.role == "system")
.map(|m| m.content.clone());let user_messages: Vec<_> = messages.iter() .filter(|m| m.role != "system") .map(|m| json!({ "role": m.role, "content": m.content })) .collect();
let mut body = json!({ "model": self.model, "messages": user_messages, "max_tokens": opts.max_tokens.unwrap_or(1024), });
if let Some(sys) = system { body["system"] = json!(sys); }
// ... send request to Anthropic API } } ```
Fallback Chains
FLIN supports fallback configuration for high availability:
ai {
provider = "openai"
model = "gpt-4o-mini"
api_key = env("OPENAI_API_KEY")fallback { provider = "deepinfra" model = "meta-llama/Meta-Llama-3-8B-Instruct" api_key = env("DEEPINFRA_API_KEY") } } ```
If the primary provider fails (rate limit, API error, timeout), the gateway automatically retries with the fallback provider. The application code is unaware of the failover.
Cost Optimization
Different providers have dramatically different pricing:
| Provider | Model | Cost per 1M tokens |
|---|---|---|
| OpenAI | GPT-4o Mini | $0.15 input / $0.60 output |
| Anthropic | Claude 3 Haiku | $0.25 input / $1.25 output |
| DeepInfra | Llama 3 8B | $0.06 input / $0.06 output |
| Groq | Llama 3 70B | $0.59 input / $0.79 output |
For the Intent Engine's query translation, where the task is relatively simple, a smaller model like Llama 3 8B on DeepInfra can be 10x cheaper than GPT-4o with comparable accuracy. FLIN's gateway makes this switch trivial.
Using AI in FLIN Applications
Beyond the Intent Engine and semantic search, FLIN developers can use AI directly in their applications:
// Summarize content
fn summarize(article) {
ai_complete("Summarize this article in 2 sentences: " + article.content, {
max_tokens: 100,
temperature: 0.3
})
}// Classify support tickets fn classify_ticket(ticket) { ai_chat([ { role: "system", content: "Classify the ticket into: billing, technical, feature_request, other. Reply with just the category." }, { role: "user", content: ticket.subject + "\n" + ticket.description } ]) }
// Generate product descriptions fn generate_description(product) { ai_complete("Write a compelling product description for: " + product.name + ". Features: " + product.features, { max_tokens: 200, temperature: 0.7 }) } ```
These are regular FLIN function calls. They work with any configured provider. They can be called from route handlers, scheduled tasks, or interactive views.
Rate Limiting and Caching
The gateway includes built-in rate limiting to respect provider limits:
pub struct ProviderRateLimiter {
requests_per_minute: u32,
tokens_per_minute: u32,
current_requests: AtomicU32,
current_tokens: AtomicU32,
window_start: AtomicU64,
}And response caching for repeated queries:
// First call: API request (200ms)
summary = ai_complete("Summarize: " + article.content)// Same input later: cached response (< 1ms) summary = ai_complete("Summarize: " + article.content) ```
The cache key is the hash of the full request (prompt, model, temperature). Responses are cached for a configurable duration (default: 1 hour).
Why a Gateway, Not a Library
The alternative to a gateway is provider-specific libraries: openai-sdk, anthropic-sdk, google-ai-sdk. Each with its own API, its own error handling, its own types. Switching providers means rewriting every AI call in your application.
FLIN's gateway makes provider selection a configuration decision, not a code decision. Your application logic expresses what it wants ("summarize this text," "classify this ticket," "embed this query"), and the gateway handles how to get it from whichever provider is configured.
This separation of concerns is especially important for the Intent Engine and semantic search, which are core language features. They should not stop working because you switched from OpenAI to Anthropic.
In the next article, we dive into FastEmbed integration -- how FLIN generates embeddings locally without any API call, enabling offline semantic search and privacy-first applications.
---
This is Part 118 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.
Series Navigation: - [117] Semantic Search and Vector Storage - [118] AI Gateway: 8 Providers, One API (you are here) - [119] FastEmbed Integration for Embeddings - [120] RAG: Retrieval, Reranking, and Source Attribution