By session 100, Deblo.ai had grown from a single-page chat demo into a full education platform with 24+ database tables, 18 API route modules, 60+ frontend components, a React Native mobile app, SSE streaming with 20+ event types, background generation jobs, voice calls, and a payment system spanning six countries. This is what that architecture looks like from the inside.
The Backend: FastAPI All The Way Down
We chose FastAPI for one reason: async. An AI education platform is fundamentally an I/O-bound system. Every chat request involves at least one HTTP call to an LLM provider, often followed by tool executions (web searches, file generation, email sending) that each involve their own network calls. A synchronous framework would spend most of its time waiting.
FastAPI with async SQLAlchemy gives us cooperative multitasking at every layer. The database queries are async. The HTTP calls to OpenRouter are async. The SSE streaming is async. The background job polling is async. A single worker process can handle hundreds of concurrent chat sessions because none of them are blocking a thread.
The application structure is deliberately flat:
backend/app/
main.py # FastAPI app, CORS, router assembly, lifespan
config.py # Settings from environment variables
database.py # Async SQLAlchemy engine, session factory, Redis pool
models/ # 24 SQLAlchemy models
routes/ # 18 API route modules
services/ # 40+ service modules (LLM, tools, payments, email, etc.)
prompts/ # System prompt components (root, classes, subjects, categories, pro)
seed.py # Database seeding for curriculum dataNo abstract base classes. No dependency injection framework. No "clean architecture" hexagonal ports-and-adapters. Every route file imports the services it needs directly. Every service file imports the models it needs directly. The call graph is obvious from reading the imports.
The Database: 24 Tables, One JSONB Column That Matters
PostgreSQL 17 is the only database. No MongoDB for "flexible schemas." No DynamoDB for "scale." PostgreSQL does everything we need, and it does it with ACID guarantees that matter when you are tracking financial transactions (credit ledger) and educational progress (exercise results).
The most important model is User:
class User(Base):
__tablename__ = "users"id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4) phone = Column(String(20), unique=True, nullable=True, index=True) email = Column(String(255), unique=True, nullable=True, index=True) google_id = Column(String(255), unique=True, nullable=True, index=True) auth_provider = Column(String(20), nullable=True) name = Column(String(100), nullable=True) preferred_class = Column(String(20), nullable=True) user_type = Column(String(20), nullable=True) # 'child' | 'parent' | 'professional' credit_balance = Column(Integer, default=0) free_credits_today = Column(Integer, default=5) free_credits_date = Column(Date, nullable=True) country = Column(String(5), nullable=True) preferred_language = Column(String(5), default="fr") referral_code = Column(String(5), unique=True, nullable=True) access_code = Column(String(12), unique=True, nullable=True) push_token = Column(String(500), nullable=True) # ... timestamps, admin flags, etc. ```
Three authentication paths converge on this single model: phone (WhatsApp OTP), email (Google OAuth), and access code (for organization members who have neither phone nor email). The user_type field determines which mode the user sees by default. The preferred_class field locks grade-level adaptation. The credit_balance and free_credits_today fields are the hot path for every single chat request.
The second most important model is Conversation, and it contains the design decision that most shaped our architecture:
class Conversation(Base):
__tablename__ = "conversations"id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4) user_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=True) class_id = Column(String(20), nullable=True) subject = Column(String(50), nullable=True) mode = Column(String(10), nullable=True) # "child" | "pro" domain = Column(String(50), nullable=True) # "syscohada" | "fiscalite" | etc. category = Column(String(20), nullable=True) # "question" | "exercice" | "devoir" | "examen" agent = Column(String(50), nullable=True) # pro agent id project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id"), nullable=True) messages = Column(JSONB, default=[]) message_count = Column(Integer, default=0) # ... timestamps, archive/favorite flags ```
The messages column is JSONB. Every message in a conversation -- user, assistant, tool calls, tool results -- lives in a single JSON array on the conversation row. No messages table. No joins.
This was a deliberate trade-off. The alternative was a normalized messages table with foreign keys back to conversations. That would be the "correct" relational design. But consider the access pattern: every chat request loads the full conversation history to send it to the LLM as context. With a normalized design, that is a join query for every message. With JSONB, it is a single row fetch. On a platform where the median conversation has 10-20 messages and the hot path is latency-sensitive SSE streaming, eliminating that join matters.
The downside is that updating a single message requires rewriting the entire JSONB array. We accepted this because message updates are rare (only title auto-generation and archival operations), while full-history reads happen on every single chat request.
The full table list spans the platform's domains:
- Auth:
users,otp_codes - Chat:
conversations - Credits:
credit_purchases,credit_usages,credit_ledger,coupons - Files:
uploaded_files,file_folders,document_chunks - AI:
ai_logs,ai_memories,generation_jobs - Pedagogy:
exercise_results,daily_suggestions - Organization:
organizations,org_memberships - Tasks:
tasks - Curriculum:
curriculum(seed data) - Communication:
notification_templates,user_notifications - Voice:
voice_sessions - Projects:
projects - Referral:
referrals - Settings:
system_settings
The Frontend: SvelteKit 2 + Svelte 5 Runes
The frontend is a SvelteKit 2 application using Svelte 5 runes throughout. No legacy let reactivity. No Svelte 4 stores in new code. Everything uses $state, $derived, $props, and $effect.
The component count is above 60, organized by feature:
- Chat components: Message bubbles, input bar, file upload, voice recording, quiz widgets, tool progress indicators, code blocks with syntax highlighting
- Layout components: Header, sidebar, navigation, theme toggle, mobile drawer
- Credit components: Balance display, recharge modal, package cards, transaction history
- Admin components: Dashboard, user management, conversation viewer, settings editor, analytics
- Auth components: Phone input, OTP verification, Google sign-in, country selector
State management uses Svelte writable stores with localStorage persistence for client-side state (theme, language, sidebar collapsed) and server-side API calls for everything else. There is no global state management library. The stores are simple:
import { writable } from 'svelte/store';
import { browser } from '$app/environment';function createPersistedStore
if (browser) { store.subscribe(v => localStorage.setItem(key, JSON.stringify(v))); }
return store; }
export const theme = createPersistedStore<'light' | 'dark'>('deblo-theme', 'light'); export const sidebarCollapsed = createPersistedStore('deblo-sidebar', false); export const preferredLanguage = createPersistedStore('deblo-lang', 'fr'); ```
The SSE Protocol: 20+ Event Types
The chat endpoint does not return a JSON response. It returns a Server-Sent Events stream. The frontend opens an EventSource-like connection (actually a fetch with ReadableStream parsing) and receives a sequence of typed events:
event: content
data: {"text": "Bonjour ! "}event: content data: {"text": "Qu'est-ce que "}
event: tool_start data: {"name": "interactive_quiz", "id": "call_abc123"}
event: quiz data: {"question": "Combien font 3 + 4 ?", "options": ["5", "6", "7", "8"], ...}
event: tool_end data: {"name": "interactive_quiz", "id": "call_abc123"}
event: credits data: {"free": 28, "recharge": 150, "total": 178}
event: done data: {"conversation_id": "uuid-here", "title": "Addition"} ```
The event types include:
content-- streamed text tokens from the LLMreasoning-- thinking/reasoning tokens (for advanced models)tool_start/tool_progress/tool_end-- tool execution lifecyclequiz/true_false_quiz-- interactive quiz widgetsfile-- generated file (Excel, PDF, PowerPoint, Word, HTML, Markdown)credits-- updated credit balance after deductionbonus_credits-- AI awarded bonus credits for student efforterror-- error messagesdone-- stream completion with conversation metadataannotations-- source citations from the LLMdraft_email-- email draft for user review before sendingexercise_result-- silently tracked exercise outcomenotification-- push notification triggered by AI action
Each event type maps to a specific frontend handler. The quiz event renders an interactive QCM widget. The file event triggers a download button with preview. The tool_start event shows a progress indicator ("Searching the web...", "Generating Excel file...").
The Mobile App: Expo React Native Monorepo
The mobile app arrived at session 86 -- roughly three weeks into the build. It is an Expo React Native application structured as a monorepo with shared packages:
deblo-mobile/
apps/
k12/ # Student-facing app (Expo Router)
packages/
@deblo/api/ # HTTP client, endpoints, types
@deblo/stores/ # Zustand stores, shared state
@deblo/streaming/ # SSE parsing, event handlers
@deblo/i18n/ # Internationalization (fr, en)The @deblo/api package wraps every backend endpoint with typed functions. The @deblo/stores package uses Zustand (the React state management library) with AsyncStorage persistence. The @deblo/streaming package handles SSE parsing -- the same event protocol as the web frontend, just with a different transport layer (fetch + ReadableStream instead of browser EventSource).
The monorepo structure was chosen for one reason: the Pro app is planned as a separate Expo app (apps/pro/) that shares all four packages with the K12 app. Different UI, different navigation, same API client, same streaming logic, same i18n strings.
Docker Compose: Four Services
The production deployment is a Docker Compose stack with four services:
services:
frontend:
build: ./frontend
ports: ["5173:5173"]
environment:
- PUBLIC_API_URL=https://api.deblo.ai
depends_on: [backend]backend: build: ./backend ports: ["8000:8000"] environment: - DATABASE_URL=postgresql+asyncpg://... - REDIS_URL=redis://redis:6379 - OPENROUTER_API_KEY=${OPENROUTER_API_KEY} depends_on: [postgres, redis]
postgres: image: pgvector/pgvector:pg17 volumes: ["pgdata:/var/lib/postgresql/data"] environment: - POSTGRES_DB=deblo - POSTGRES_USER=deblo - POSTGRES_PASSWORD=${DB_PASSWORD}
redis: image: redis:7-alpine volumes: ["redisdata:/data"] ```
We use pgvector/pgvector:pg17 instead of the standard PostgreSQL image because the RAG pipeline needs vector similarity search for document embeddings. Redis serves triple duty: SSE progress tracking for background jobs, quiz state storage (with TTL expiration), and payment polling coordination.
The backend runs with Uvicorn, 4 workers, behind a reverse proxy. The frontend is a Node.js SvelteKit server. No static export -- we need server-side rendering for SEO and the API proxy layer that handles cookie-based auth token forwarding.
The Service Layer: 40+ Modules
The services/ directory is where most of the complexity lives. Each service module owns a specific capability:
llm.py-- OpenRouter streaming, agentic tool loop, context managementtool_executor.py-- dispatches tool calls to their implementationstools.py-- 24 tool definitions (OpenRouter-compatible JSON schemas)credits.py-- balance checks, deductions, ledger logging, daily refillbackground_generation.py-- detached asyncio tasks for long-running generationsfile_generator.py-- Excel, PDF, PowerPoint, Word, HTML, Markdown generationweb_tools.py-- Tavily web search, Jina Reader URL browsingsandbox.py-- bash execution in a sandboxed subprocessquiz.py-- quiz state management in Redismemory.py-- AI memory persistence (save/recall user preferences)payment.py-- payment initiation across multiple providerspayment_poller.py-- background polling for payment confirmationzerofee.py/xpaye.py/stripe_service.py-- provider-specific integrationstwilio_client.py-- WhatsApp OTP deliveryemail.py-- transactional email via Resendfirebase_service.py-- push notifications via FCMultravox.py-- voice call managementpreprocessing.py-- document text extraction (PDF, DOCX, images)embedding.py-- text embedding for RAG pipelinerag_service.py-- retrieval-augmented generation from user documentsnotification_service.py-- in-app notification dispatchdaily_suggestions.py-- AI-generated daily study suggestions
No service depends on another service's internal state. They communicate through the database and Redis. This means any service can be tested in isolation with a database fixture and a Redis mock.
The Authentication Triple
Authentication was one of the most iterated-upon systems. The final design supports three paths:
1. WhatsApp OTP: User enters phone number, receives a 6-digit code via WhatsApp (Twilio), verifies, receives JWT. This is the primary path because WhatsApp delivery rates in Africa are dramatically higher than SMS.
2. Google OAuth: User taps "Sign in with Google," completes the OAuth flow, backend creates or links account by email/Google ID.
3. Access Code: Organization admins generate 12-character access codes for members who have neither phone nor email (common in corporate training scenarios). The code is entered once and linked to an account.
All three paths converge on the same User model and the same JWT token format. The token has a 30-day expiry -- long enough that students do not have to re-authenticate every week, short enough that lost phones do not create indefinite access.
What 100 Sessions Taught Us
The architecture was not designed upfront. It was discovered through building. Session 1 had a single chat endpoint. Session 15 added the credit system. Session 27 added the credit ledger. Session 35 added background generation. Session 50 added tool calling. Session 67 shipped to production. Session 86 started the mobile app.
Each session added one capability, and the architecture grew to accommodate it. The flat file structure, the JSONB messages column, the service-per-capability organization -- these were all decisions made under the constraint of "we need to ship this today and extend it tomorrow."
That constraint, paradoxically, produced a cleaner architecture than most planning exercises would have. Every abstraction exists because a concrete need demanded it. No speculative generalization. No "we might need this later" layers. Just the code that the product requires, organized so that the next session can extend it without rewriting what came before.
---
This is Part 2 of a 12-part series on building Deblo.ai.
1. AI Tutoring for 250 Million African Students 2. 100 Sessions Later: The Architecture of an AI Education Platform (you are here) 3. The Agentic Loop: 24 AI Tools in a Single Chat 4. System Prompts That Teach: Anti-Cheating, Socratic Method, and Grade-Level Adaptation 5. WhatsApp OTP and the African Authentication Problem 6. Credits, FCFA, and 6 African Payment Gateways 7. SSE Streaming: Real-Time AI Responses in SvelteKit 8. Voice Calls With AI: Ultravox, LiveKit, and WebRTC 9. Building a React Native K12 App in 7 Days 10. 101 AI Advisors: Professional Intelligence for Africa 11. Background Jobs: When AI Takes 30 Minutes to Think 12. From Abidjan to 250 Million: The Deblo.ai Story