100 Sessions Later: The Architecture of an AI Education Platform

By session 100, Deblo.ai had grown from a single-page chat demo into a full education platform with 24+ database tables, 18 API route modules, 60+ frontend components, a React Native mobile app, SSE streaming with 20+ event types, background generation jobs, voice calls, and a payment system spanning six countries. This is what that architecture looks like from the inside.

The Backend: FastAPI All The Way Down

We chose FastAPI for one reason: async. An AI education platform is fundamentally an I/O-bound system. Every chat request involves at least one HTTP call to an LLM provider, often followed by tool executions (web searches, file generation, email sending) that each involve their own network calls. A synchronous framework would spend most of its time waiting.

FastAPI with async SQLAlchemy gives us cooperative multitasking at every layer. The database queries are async. The HTTP calls to OpenRouter are async. The SSE streaming is async. The background job polling is async. A single worker process can handle hundreds of concurrent chat sessions because none of them are blocking a thread.

The application structure is deliberately flat:

backend/app/
    main.py              # FastAPI app, CORS, router assembly, lifespan
    config.py            # Settings from environment variables
    database.py          # Async SQLAlchemy engine, session factory, Redis pool
    models/              # 24 SQLAlchemy models
    routes/              # 18 API route modules
    services/            # 40+ service modules (LLM, tools, payments, email, etc.)
    prompts/             # System prompt components (root, classes, subjects, categories, pro)
    seed.py              # Database seeding for curriculum data

No abstract base classes. No dependency injection framework. No "clean architecture" hexagonal ports-and-adapters. Every route file imports the services it needs directly. Every service file imports the models it needs directly. The call graph is obvious from reading the imports.

The Database: 24 Tables, One JSONB Column That Matters

PostgreSQL 17 is the only database. No MongoDB for "flexible schemas." No DynamoDB for "scale." PostgreSQL does everything we need, and it does it with ACID guarantees that matter when you are tracking financial transactions (credit ledger) and educational progress (exercise results).

The most important model is User:

pythonclass User(Base):
    __tablename__ = "users"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4)
    phone = Column(String(20), unique=True, nullable=True, index=True)
    email = Column(String(255), unique=True, nullable=True, index=True)
    google_id = Column(String(255), unique=True, nullable=True, index=True)
    auth_provider = Column(String(20), nullable=True)
    name = Column(String(100), nullable=True)
    preferred_class = Column(String(20), nullable=True)
    user_type = Column(String(20), nullable=True)  # 'child' | 'parent' | 'professional'
    credit_balance = Column(Integer, default=0)
    free_credits_today = Column(Integer, default=5)
    free_credits_date = Column(Date, nullable=True)
    country = Column(String(5), nullable=True)
    preferred_language = Column(String(5), default="fr")
    referral_code = Column(String(5), unique=True, nullable=True)
    access_code = Column(String(12), unique=True, nullable=True)
    push_token = Column(String(500), nullable=True)
    # ... timestamps, admin flags, etc.

Three authentication paths converge on this single model: phone (WhatsApp OTP), email (Google OAuth), and access code (for organization members who have neither phone nor email). The user_type field determines which mode the user sees by default. The preferred_class field locks grade-level adaptation. The credit_balance and free_credits_today fields are the hot path for every single chat request.

The second most important model is Conversation, and it contains the design decision that most shaped our architecture:

pythonclass Conversation(Base):
    __tablename__ = "conversations"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4)
    user_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=True)
    class_id = Column(String(20), nullable=True)
    subject = Column(String(50), nullable=True)
    mode = Column(String(10), nullable=True)       # "child" | "pro"
    domain = Column(String(50), nullable=True)     # "syscohada" | "fiscalite" | etc.
    category = Column(String(20), nullable=True)   # "question" | "exercice" | "devoir" | "examen"
    agent = Column(String(50), nullable=True)      # pro agent id
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id"), nullable=True)
    messages = Column(JSONB, default=[])
    message_count = Column(Integer, default=0)
    # ... timestamps, archive/favorite flags

The messages column is JSONB. Every message in a conversation -- user, assistant, tool calls, tool results -- lives in a single JSON array on the conversation row. No messages table. No joins.

This was a deliberate trade-off. The alternative was a normalized messages table with foreign keys back to conversations. That would be the "correct" relational design. But consider the access pattern: every chat request loads the full conversation history to send it to the LLM as context. With a normalized design, that is a join query for every message. With JSONB, it is a single row fetch. On a platform where the median conversation has 10-20 messages and the hot path is latency-sensitive SSE streaming, eliminating that join matters.

The downside is that updating a single message requires rewriting the entire JSONB array. We accepted this because message updates are rare (only title auto-generation and archival operations), while full-history reads happen on every single chat request.

The full table list spans the platform's domains:

Auth: users, otp_codes
Chat: conversations
Credits: credit_purchases, credit_usages, credit_ledger, coupons
Files: uploaded_files, file_folders, document_chunks
AI: ai_logs, ai_memories, generation_jobs
Pedagogy: exercise_results, daily_suggestions
Organization: organizations, org_memberships
Tasks: tasks
Curriculum: curriculum (seed data)
Communication: notification_templates, user_notifications
Voice: voice_sessions
Projects: projects
Referral: referrals
Settings: system_settings

The Frontend: SvelteKit 2 + Svelte 5 Runes

The frontend is a SvelteKit 2 application using Svelte 5 runes throughout. No legacy let reactivity. No Svelte 4 stores in new code. Everything uses $state, $derived, $props, and $effect.

The component count is above 60, organized by feature:

Chat components: Message bubbles, input bar, file upload, voice recording, quiz widgets, tool progress indicators, code blocks with syntax highlighting
Layout components: Header, sidebar, navigation, theme toggle, mobile drawer
Credit components: Balance display, recharge modal, package cards, transaction history
Admin components: Dashboard, user management, conversation viewer, settings editor, analytics
Auth components: Phone input, OTP verification, Google sign-in, country selector

State management uses Svelte writable stores with localStorage persistence for client-side state (theme, language, sidebar collapsed) and server-side API calls for everything else. There is no global state management library. The stores are simple:

typescriptimport { writable } from 'svelte/store';
import { browser } from '$app/environment';

function createPersistedStore<T>(key: string, initial: T) {
    const stored = browser ? localStorage.getItem(key) : null;
    const value = stored ? JSON.parse(stored) : initial;
    const store = writable<T>(value);

    if (browser) {
        store.subscribe(v => localStorage.setItem(key, JSON.stringify(v)));
    }

    return store;
}

export const theme = createPersistedStore<'light' | 'dark'>('deblo-theme', 'light');
export const sidebarCollapsed = createPersistedStore('deblo-sidebar', false);
export const preferredLanguage = createPersistedStore('deblo-lang', 'fr');

The SSE Protocol: 20+ Event Types

The chat endpoint does not return a JSON response. It returns a Server-Sent Events stream. The frontend opens an EventSource-like connection (actually a fetch with ReadableStream parsing) and receives a sequence of typed events:

event: content
data: {"text": "Bonjour ! "}

event: content
data: {"text": "Qu'est-ce que "}

event: tool_start
data: {"name": "interactive_quiz", "id": "call_abc123"}

event: quiz
data: {"question": "Combien font 3 + 4 ?", "options": ["5", "6", "7", "8"], ...}

event: tool_end
data: {"name": "interactive_quiz", "id": "call_abc123"}

event: credits
data: {"free": 28, "recharge": 150, "total": 178}

event: done
data: {"conversation_id": "uuid-here", "title": "Addition"}

The event types include:

content -- streamed text tokens from the LLM
reasoning -- thinking/reasoning tokens (for advanced models)
tool_start / tool_progress / tool_end -- tool execution lifecycle
quiz / true_false_quiz -- interactive quiz widgets
file -- generated file (Excel, PDF, PowerPoint, Word, HTML, Markdown)
credits -- updated credit balance after deduction
bonus_credits -- AI awarded bonus credits for student effort
error -- error messages
done -- stream completion with conversation metadata
annotations -- source citations from the LLM
draft_email -- email draft for user review before sending
exercise_result -- silently tracked exercise outcome
notification -- push notification triggered by AI action

Each event type maps to a specific frontend handler. The quiz event renders an interactive QCM widget. The file event triggers a download button with preview. The tool_start event shows a progress indicator ("Searching the web...", "Generating Excel file...").

The Mobile App: Expo React Native Monorepo

The mobile app arrived at session 86 -- roughly three weeks into the build. It is an Expo React Native application structured as a monorepo with shared packages:

deblo-mobile/
    apps/
        k12/               # Student-facing app (Expo Router)
    packages/
        @deblo/api/         # HTTP client, endpoints, types
        @deblo/stores/      # Zustand stores, shared state
        @deblo/streaming/   # SSE parsing, event handlers
        @deblo/i18n/        # Internationalization (fr, en)

The @deblo/api package wraps every backend endpoint with typed functions. The @deblo/stores package uses Zustand (the React state management library) with AsyncStorage persistence. The @deblo/streaming package handles SSE parsing -- the same event protocol as the web frontend, just with a different transport layer (fetch + ReadableStream instead of browser EventSource).

The monorepo structure was chosen for one reason: the Pro app is planned as a separate Expo app (apps/pro/) that shares all four packages with the K12 app. Different UI, different navigation, same API client, same streaming logic, same i18n strings.

Docker Compose: Four Services

The production deployment is a Docker Compose stack with four services:

yamlservices:
  frontend:
    build: ./frontend
    ports: ["5173:5173"]
    environment:
      - PUBLIC_API_URL=https://api.deblo.ai
    depends_on: [backend]

  backend:
    build: ./backend
    ports: ["8000:8000"]
    environment:
      - DATABASE_URL=postgresql+asyncpg://...
      - REDIS_URL=redis://redis:6379
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
    depends_on: [postgres, redis]

  postgres:
    image: pgvector/pgvector:pg17
    volumes: ["pgdata:/var/lib/postgresql/data"]
    environment:
      - POSTGRES_DB=deblo
      - POSTGRES_USER=deblo
      - POSTGRES_PASSWORD=${DB_PASSWORD}

  redis:
    image: redis:7-alpine
    volumes: ["redisdata:/data"]

We use pgvector/pgvector:pg17 instead of the standard PostgreSQL image because the RAG pipeline needs vector similarity search for document embeddings. Redis serves triple duty: SSE progress tracking for background jobs, quiz state storage (with TTL expiration), and payment polling coordination.

The backend runs with Uvicorn, 4 workers, behind a reverse proxy. The frontend is a Node.js SvelteKit server. No static export -- we need server-side rendering for SEO and the API proxy layer that handles cookie-based auth token forwarding.

The Service Layer: 40+ Modules

The services/ directory is where most of the complexity lives. Each service module owns a specific capability:

llm.py -- OpenRouter streaming, agentic tool loop, context management
tool_executor.py -- dispatches tool calls to their implementations
tools.py -- 24 tool definitions (OpenRouter-compatible JSON schemas)
credits.py -- balance checks, deductions, ledger logging, daily refill
background_generation.py -- detached asyncio tasks for long-running generations
file_generator.py -- Excel, PDF, PowerPoint, Word, HTML, Markdown generation
web_tools.py -- Tavily web search, Jina Reader URL browsing
sandbox.py -- bash execution in a sandboxed subprocess
quiz.py -- quiz state management in Redis
memory.py -- AI memory persistence (save/recall user preferences)
payment.py -- payment initiation across multiple providers
payment_poller.py -- background polling for payment confirmation
zerofee.py / xpaye.py / stripe_service.py -- provider-specific integrations
twilio_client.py -- WhatsApp OTP delivery
email.py -- transactional email via Resend
firebase_service.py -- push notifications via FCM
ultravox.py -- voice call management
preprocessing.py -- document text extraction (PDF, DOCX, images)
embedding.py -- text embedding for RAG pipeline
rag_service.py -- retrieval-augmented generation from user documents
notification_service.py -- in-app notification dispatch
daily_suggestions.py -- AI-generated daily study suggestions

No service depends on another service's internal state. They communicate through the database and Redis. This means any service can be tested in isolation with a database fixture and a Redis mock.

The Authentication Triple

Authentication was one of the most iterated-upon systems. The final design supports three paths:

WhatsApp OTP: User enters phone number, receives a 6-digit code via WhatsApp (Twilio), verifies, receives JWT. This is the primary path because WhatsApp delivery rates in Africa are dramatically higher than SMS.

Google OAuth: User taps "Sign in with Google," completes the OAuth flow, backend creates or links account by email/Google ID.

Access Code: Organization admins generate 12-character access codes for members who have neither phone nor email (common in corporate training scenarios). The code is entered once and linked to an account.

All three paths converge on the same User model and the same JWT token format. The token has a 30-day expiry -- long enough that students do not have to re-authenticate every week, short enough that lost phones do not create indefinite access.

What 100 Sessions Taught Us

The architecture was not designed upfront. It was discovered through building. Session 1 had a single chat endpoint. Session 15 added the credit system. Session 27 added the credit ledger. Session 35 added background generation. Session 50 added tool calling. Session 67 shipped to production. Session 86 started the mobile app.

Each session added one capability, and the architecture grew to accommodate it. The flat file structure, the JSONB messages column, the service-per-capability organization -- these were all decisions made under the constraint of "we need to ship this today and extend it tomorrow."

That constraint, paradoxically, produced a cleaner architecture than most planning exercises would have. Every abstraction exists because a concrete need demanded it. No speculative generalization. No "we might need this later" layers. Just the code that the product requires, organized so that the next session can extend it without rewriting what came before.

This is Part 2 of a 12-part series on building Deblo.ai.

AI Tutoring for 250 Million African Students
100 Sessions Later: The Architecture of an AI Education Platform (you are here)
The Agentic Loop: 24 AI Tools in a Single Chat
System Prompts That Teach: Anti-Cheating, Socratic Method, and Grade-Level Adaptation
WhatsApp OTP and the African Authentication Problem
Credits, FCFA, and 6 African Payment Gateways
SSE Streaming: Real-Time AI Responses in SvelteKit
Voice Calls With AI: Ultravox, LiveKit, and WebRTC
Building a React Native K12 App in 7 Days
101 AI Advisors: Professional Intelligence for Africa
Background Jobs: When AI Takes 30 Minutes to Think
From Abidjan to 250 Million: The Deblo.ai Story

100 Sessions Later: The Architecture of an AI Education Platform

The Backend: FastAPI All The Way Down

The Database: 24 Tables, One JSONB Column That Matters

The Frontend: SvelteKit 2 + Svelte 5 Runes

The SSE Protocol: 20+ Event Types

The Mobile App: Expo React Native Monorepo

Docker Compose: Four Services

The Service Layer: 40+ Modules

The Authentication Triple

What 100 Sessions Taught Us

Responses

Related Articles

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

The gate caught its own drift: one day inside CASP with Claude Fable 5

The CASP Transplant: How The Six-File Discipline Moved From Conductor To An Anti-Fraud Transport ERP, What The /next Skill Adds When The Operator Just Types 'next', And Why The Cost Of CASP Drift Rises When The Project Is Someone Else's Cash