Back to flin
flin

#118 -- AI Gateway: 8 Providers, One API

How FLIN's AI Gateway provides a unified interface to OpenAI, Anthropic, DeepInfra, Google, Mistral, Cohere, Groq, and local models -- switch providers by changing one line of configuration.

Juste A. Gnimavo (Thales) & Claude | March 26, 2026 7 min flin
EN/ FR/ ES
flinaigatewayprovidersapi

The AI landscape in 2026 is fragmented. OpenAI has GPT-4o. Anthropic has Claude. Google has Gemini. Mistral has their open-weight models. Cohere specializes in embeddings. Groq offers inference at extraordinary speed. DeepInfra hosts dozens of open-source models. Each provider has its own API format, its own authentication scheme, its own pricing model, and its own SDK.

A FLIN application that uses AI should not be locked into a single provider. If OpenAI raises prices, you should be able to switch to DeepInfra. If Anthropic adds a feature you need, you should be able to try it without rewriting your code. If you want to run locally for privacy, you should be able to use a local model.

FLIN's AI Gateway provides a unified interface to eight providers. Your FLIN code calls ai_complete(), ai_embed(), and ai_chat(). The gateway routes the request to the configured provider, translates the API format, and returns a normalized response. Switching providers is one line in flin.config.

The Unified API

Three functions cover the most common AI operations:

flin// Text completion
response = ai_complete("Summarize this article: " + article.content, {
    max_tokens: 200,
    temperature: 0.3
})

// Chat completion
response = ai_chat([
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: user_message }
])

// Embeddings
vector = ai_embed("comfortable office chair for long work sessions")

These functions work regardless of which provider is configured. The API is the same whether you are using GPT-4o, Claude, Gemini, or a local Llama model.

Provider Configuration

The AI provider is configured in flin.config:

flin// flin.config
ai {
    provider = "openai"
    model = "gpt-4o-mini"
    embedding_model = "text-embedding-3-small"
    api_key = env("OPENAI_API_KEY")
}

Switching to another provider:

flin// Anthropic
ai {
    provider = "anthropic"
    model = "claude-3-haiku"
    api_key = env("ANTHROPIC_API_KEY")
}

// DeepInfra
ai {
    provider = "deepinfra"
    model = "meta-llama/Meta-Llama-3-8B-Instruct"
    api_key = env("DEEPINFRA_API_KEY")
}

// Groq (for speed)
ai {
    provider = "groq"
    model = "llama3-70b-8192"
    api_key = env("GROQ_API_KEY")
}

// Local (no API key needed)
ai {
    provider = "local"
    model = "llama3"
    endpoint = "http://localhost:11434"
}

The application code does not change. The same ai_complete() call works with every provider.

The Eight Supported Providers

ProviderModelsBest For
OpenAIGPT-4o, GPT-4o MiniGeneral-purpose, vision
AnthropicClaude 3 Opus, Sonnet, HaikuLong context, reasoning
GoogleGemini Pro, Gemini FlashMultimodal, speed
MistralMistral Large, Medium, SmallEuropean data residency
CohereCommand R+, Embed v3Embeddings, RAG
GroqLlama 3, MixtralUltra-low latency
DeepInfra50+ open modelsCost optimization
LocalOllama, llama.cppPrivacy, offline

Gateway Implementation

The gateway translates between FLIN's unified format and each provider's specific API:

rustpub struct AiGateway {
    provider: Box<dyn AiProvider>,
    config: AiConfig,
}

pub trait AiProvider: Send + Sync {
    async fn complete(&self, prompt: &str, opts: &CompletionOptions) -> Result<String, AiError>;
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError>;
    async fn embed(&self, text: &str) -> Result<Vec<f32>, AiError>;
}

impl AiGateway {
    pub fn new(config: &AiConfig) -> Result<Self, AiError> {
        let provider: Box<dyn AiProvider> = match config.provider.as_str() {
            "openai" => Box::new(OpenAiProvider::new(&config)?),
            "anthropic" => Box::new(AnthropicProvider::new(&config)?),
            "google" => Box::new(GoogleProvider::new(&config)?),
            "mistral" => Box::new(MistralProvider::new(&config)?),
            "cohere" => Box::new(CohereProvider::new(&config)?),
            "groq" => Box::new(GroqProvider::new(&config)?),
            "deepinfra" => Box::new(DeepInfraProvider::new(&config)?),
            "local" => Box::new(LocalProvider::new(&config)?),
            other => return Err(AiError::UnknownProvider(other.into())),
        };

        Ok(Self { provider, config })
    }
}

Each provider implementation translates the unified request format to the provider's specific API:

rustpub struct OpenAiProvider {
    api_key: String,
    model: String,
    base_url: String,
}

impl AiProvider for OpenAiProvider {
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError> {
        let body = json!({
            "model": self.model,
            "messages": messages.iter().map(|m| json!({
                "role": m.role,
                "content": m.content
            })).collect::<Vec<_>>(),
            "max_tokens": opts.max_tokens.unwrap_or(1024),
            "temperature": opts.temperature.unwrap_or(0.7),
        });

        let response = reqwest::Client::new()
            .post(format!("{}/chat/completions", self.base_url))
            .bearer_auth(&self.api_key)
            .json(&body)
            .send()
            .await?;

        let data: OpenAiResponse = response.json().await?;
        Ok(data.choices[0].message.content.clone())
    }
}

The Anthropic provider translates to Anthropic's format (which uses system as a separate parameter, not a message):

rustimpl AiProvider for AnthropicProvider {
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError> {
        let system = messages.iter()
            .find(|m| m.role == "system")
            .map(|m| m.content.clone());

        let user_messages: Vec<_> = messages.iter()
            .filter(|m| m.role != "system")
            .map(|m| json!({ "role": m.role, "content": m.content }))
            .collect();

        let mut body = json!({
            "model": self.model,
            "messages": user_messages,
            "max_tokens": opts.max_tokens.unwrap_or(1024),
        });

        if let Some(sys) = system {
            body["system"] = json!(sys);
        }

        // ... send request to Anthropic API
    }
}

Fallback Chains

FLIN supports fallback configuration for high availability:

flinai {
    provider = "openai"
    model = "gpt-4o-mini"
    api_key = env("OPENAI_API_KEY")

    fallback {
        provider = "deepinfra"
        model = "meta-llama/Meta-Llama-3-8B-Instruct"
        api_key = env("DEEPINFRA_API_KEY")
    }
}

If the primary provider fails (rate limit, API error, timeout), the gateway automatically retries with the fallback provider. The application code is unaware of the failover.

Cost Optimization

Different providers have dramatically different pricing:

ProviderModelCost per 1M tokens
OpenAIGPT-4o Mini$0.15 input / $0.60 output
AnthropicClaude 3 Haiku$0.25 input / $1.25 output
DeepInfraLlama 3 8B$0.06 input / $0.06 output
GroqLlama 3 70B$0.59 input / $0.79 output

For the Intent Engine's query translation, where the task is relatively simple, a smaller model like Llama 3 8B on DeepInfra can be 10x cheaper than GPT-4o with comparable accuracy. FLIN's gateway makes this switch trivial.

Using AI in FLIN Applications

Beyond the Intent Engine and semantic search, FLIN developers can use AI directly in their applications:

flin// Summarize content
fn summarize(article) {
    ai_complete("Summarize this article in 2 sentences: " + article.content, {
        max_tokens: 100,
        temperature: 0.3
    })
}

// Classify support tickets
fn classify_ticket(ticket) {
    ai_chat([
        { role: "system", content: "Classify the ticket into: billing, technical, feature_request, other. Reply with just the category." },
        { role: "user", content: ticket.subject + "\n" + ticket.description }
    ])
}

// Generate product descriptions
fn generate_description(product) {
    ai_complete("Write a compelling product description for: " + product.name + ". Features: " + product.features, {
        max_tokens: 200,
        temperature: 0.7
    })
}

These are regular FLIN function calls. They work with any configured provider. They can be called from route handlers, scheduled tasks, or interactive views.

Rate Limiting and Caching

The gateway includes built-in rate limiting to respect provider limits:

rustpub struct ProviderRateLimiter {
    requests_per_minute: u32,
    tokens_per_minute: u32,
    current_requests: AtomicU32,
    current_tokens: AtomicU32,
    window_start: AtomicU64,
}

And response caching for repeated queries:

flin// First call: API request (200ms)
summary = ai_complete("Summarize: " + article.content)

// Same input later: cached response (< 1ms)
summary = ai_complete("Summarize: " + article.content)

The cache key is the hash of the full request (prompt, model, temperature). Responses are cached for a configurable duration (default: 1 hour).

Why a Gateway, Not a Library

The alternative to a gateway is provider-specific libraries: openai-sdk, anthropic-sdk, google-ai-sdk. Each with its own API, its own error handling, its own types. Switching providers means rewriting every AI call in your application.

FLIN's gateway makes provider selection a configuration decision, not a code decision. Your application logic expresses what it wants ("summarize this text," "classify this ticket," "embed this query"), and the gateway handles how to get it from whichever provider is configured.

This separation of concerns is especially important for the Intent Engine and semantic search, which are core language features. They should not stop working because you switched from OpenAI to Anthropic.

In the next article, we dive into FastEmbed integration -- how FLIN generates embeddings locally without any API call, enabling offline semantic search and privacy-first applications.


This is Part 118 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [117] Semantic Search and Vector Storage - [118] AI Gateway: 8 Providers, One API (you are here) - [119] FastEmbed Integration for Embeddings - [120] RAG: Retrieval, Reranking, and Source Attribution

Share this article:

Responses

Write a response
0/2000
Loading responses...

Related Articles

Thales & Claude thales

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

One prompt, thirteen agents, forty-three minutes: the first production session with Claude Fable 5 and Claude Code's Workflow tool shipped a complete seven-page production website plus a backend lead-capture endpoint in a single commit. The build log: the deterministic orchestration script, the contract-injection pattern between phases, the per-agent economics of the parallel fan-out, and the session-limit cliffhanger the resume journal turned into a non-event.

20 min Jun 12, 2026
claude-fable-5claude-codeworkflow-toolmulti-agent +10
Thales & Claude casp

The gate caught its own drift: one day inside CASP with Claude Fable 5

We handed the most autonomous Claude model yet the keys to CASP — the open-source CLI that keeps AI coding agents honest against git — with the authority to reject our own roadmap. It rejected five things, found two real bugs in the validator by dogfooding it, fixed them under a two-auditor gate, and left casp check fully green on its own repo for the first time. CASP 0.3.0 is the result.

14 min Jun 10, 2026
caspzerosuiteworkflowai-cto +9
Thales & Claude zerosuite

The CASP Transplant: How The Six-File Discipline Moved From Conductor To An Anti-Fraud Transport ERP, What The /next Skill Adds When The Operator Just Types 'next', And Why The Cost Of CASP Drift Rises When The Project Is Someone Else's Cash

The CASP discipline that ran thirty-five Conductor sessions is product-agnostic. The build log of transplanting it to KASSIA, an anti-fraud transport ERP for a Côte d'Ivoire fleet operator: what moved, what did not (the bespoke validator — and what its absence costs), what the /next skill adds when the operator types one word, and where the CASP stops — the deployment bug it could not see because it records intent, not infrastructure reality.

20 min Jun 8, 2026
kassiaerp-kassia-transport-logistiquezerosuiteCASP +15