The Agentic Loop: 24 AI Tools in a Single Chat

A chatbot answers questions. An agent takes actions. Deblo is an agent.

When a student sends "Aide-moi a preparer mon devoir de maths sur les fractions," the AI does not just generate text. It might generate an interactive quiz to test understanding, award bonus credits for correct answers, track exercise results for the student's progress dashboard, and generate a PDF summary of the lesson. When a professional accountant says "Genere-moi le bilan SYSCOHADA pour cette entreprise et envoie-le par e-mail," the AI searches the web for current SYSCOHADA standards, generates an Excel spreadsheet with proper accounting entries, converts it to PDF, and sends both files via email -- all in a single conversation turn.

This is the agentic loop. The LLM thinks, decides which tools to call, executes them, reads the results, thinks again, and repeats -- up to 10 iterations per user message. It is the most complex subsystem in Deblo, and the one that makes the platform fundamentally different from a chat wrapper around an API.

The Loop

The core streaming function in llm.py implements the agentic loop as a simple for loop with a hard ceiling:

pythonasync def stream_chat_response(
    messages: list[dict],
    model: str,
    tools: list[dict] | None = None,
    tool_executor: ToolExecutor | None = None,
    total_timeout: int | None = None,
    # ... other params
) -> AsyncGenerator[dict | str, None]:

    MAX_TOOL_ITERATIONS = 10
    TOOL_TIMEOUT_SECONDS = 60
    overall_start = time.monotonic()

    full_messages = list(messages)

    for iteration in range(MAX_TOOL_ITERATIONS):
        # Global timeout check
        if time.monotonic() - overall_start > TOTAL_TIMEOUT_SECONDS:
            yield "\n\nTemps d'execution maximal atteint.\n"
            break

        # Build request for OpenRouter
        current_request = {
            "model": model,
            "messages": full_messages,
            "stream": True,
        }
        if tools and iteration < MAX_TOOL_ITERATIONS - 1:
            current_request["tools"] = tools
        elif iteration >= MAX_TOOL_ITERATIONS - 1:
            current_request["tool_choice"] = "none"  # Force text on last iteration

        # Stream LLM response, accumulate content and tool calls
        collected_content = ""
        tool_calls_acc: dict[int, dict] = {}

        async for data in _raw_stream(current_request):
            # ... yield content tokens, accumulate tool call fragments

        # No tool calls? We are done.
        if not tool_calls_acc or not tool_executor:
            break

        # Execute each tool, append results, loop back to LLM
        for tc in tool_calls_list:
            result = await asyncio.wait_for(
                tool_executor(func_name, func_args, tool_call_id),
                timeout=TOOL_TIMEOUT_SECONDS,
            )
            # Truncate verbose results to prevent context overflow
            result = _truncate_tool_result(func_name, result)
            # Append tool result to message history
            full_messages.append({
                "role": "tool",
                "tool_call_id": tool_call_id,
                "content": json.dumps(result),
            })

The key design decisions embedded in this loop:

Hard ceiling of 10 iterations. On the 10th iteration, we set tool_choice: "none" to force the LLM to produce a text response instead of calling more tools. Without this, a confused model could loop indefinitely.

Per-tool timeout of 60 seconds. Each tool execution is wrapped in asyncio.wait_for. If a web search hangs, we do not block the entire stream.

Global timeout of 180 seconds for direct streaming (or up to 1,800 seconds for background jobs). The global timeout catches cases where the LLM produces many fast iterations that individually pass the per-tool check.

Result truncation. After each tool returns, we truncate its result before appending it to the message history. This is critical. A web search might return 50KB of page content. A file read might return an entire document. Without truncation, the context window fills up after 2-3 iterations and the LLM starts hallucinating or producing garbage.

The 24 Tools

The tools are organized by category. Each tool is defined as an OpenRouter-compatible JSON schema -- a type: "function" object with a name, description, and parameters schema. The LLM sees these schemas and decides when to call each tool.

File Generation (6 tools): - generate_xlsx -- Excel spreadsheets (accounting entries, budgets, balance sheets) - generate_pdf -- PDF documents (reports, memorandums, audit notes) - generate_pptx -- PowerPoint presentations (training decks, pitches) - generate_docx -- Word documents (contracts, letters, formal correspondence) - generate_html -- Rich HTML documents (newsletters, formatted content) - generate_md -- Markdown documents (notes, checklists, structured text)

Communication (4 tools): - send_email_to_user -- Send email to the current user (HTML, with attachments) - draft_email -- Create an editable email draft the user can review and send to anyone - send_sms_to_user -- SMS to the current user (urgent reminders) - send_whatsapp_to_user -- WhatsApp message to the current user (documents, recaps)

File and Memory (4 tools): - list_user_files -- Browse the user's uploaded and generated files - read_user_file -- Read a specific file's content by ID - search_user_files -- Semantic search across user's file library - save_memory -- Persist a fact about the user for future conversations

Code Execution (1 tool): - bash_execute -- Run shell commands in a sandboxed subprocess (30-second timeout, 4KB output cap)

Web Access (2 tools): - web_search -- Search the web via Tavily (capped at 5 results, 1.5KB each) - browse_url -- Fetch and parse a URL via Jina Reader (capped at 8KB)

Pedagogy (2 tools): - interactive_quiz -- Generate an interactive multiple-choice quiz widget - true_false_quiz -- Generate a true/false quiz statement

Rewards (2 tools): - award_bonus_credits -- Give the student 1-5 bonus credits for effort - report_exercise_result -- Silently log whether the student answered correctly

Task Management (1 tool): - create_task -- Create a task with title, priority, due date, and tags

Billing (1 tool): - buy_credits -- Trigger the credit purchase flow from within the chat

Reporting (1 tool): - report_bug -- Report a bug (sent to the development team via email and WhatsApp)

Not all tools are available in all modes. The child mode gets 22 tools (no bash_execute, no draft_email). The pro mode gets 24 tools (all of them). Guest users get no tools at all -- just text chat.

The Interactive Quiz Tool

The quiz system deserves its own explanation because it shows how a tool can produce a rich interactive UI element, not just text.

When the LLM decides to quiz a student, it calls interactive_quiz with structured parameters:

pythonINTERACTIVE_QUIZ_TOOL = {
    "type": "function",
    "function": {
        "name": "interactive_quiz",
        "description": (
            "Generate an interactive multiple-choice quiz question for the student. "
            "Use after explaining a concept, to consolidate understanding, or to break "
            "monotony. 1-3 quizzes per conversation max. Never on the first message."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "question": {
                    "type": "string",
                    "description": "The quiz question text",
                },
                "options": {
                    "type": "array",
                    "items": {"type": "string"},
                    "minItems": 2,
                    "maxItems": 4,
                    "description": "Answer options (2-4 choices)",
                },
                "correct_index": {
                    "type": "integer",
                    "description": "Zero-based index of the correct answer",
                },
                "explanation": {
                    "type": "string",
                    "description": "Pedagogical explanation shown after the student answers",
                },
            },
            "required": ["question", "options", "correct_index", "explanation"],
        },
    },
}

The tool executor does not "execute" this tool in the traditional sense. It stores the quiz state in Redis with a TTL and returns a sanitized version (without the correct answer) to the frontend:

pythonif func_name in ("interactive_quiz", "true_false_quiz"):
    from app.services.quiz import store_quiz
    sanitized = await store_quiz(
        redis, conversation.id, tool_call_id, func_name, func_args,
    )
    return sanitized

The frontend receives a quiz SSE event and renders an interactive widget with clickable answer buttons. When the student taps an answer, the frontend sends a separate API call to check the answer against the Redis-stored correct answer. This two-phase design means the correct answer never leaves the server until the student commits to an answer -- preventing inspection via browser dev tools.

Tool Result Truncation

Context overflow is the silent killer of agentic systems. Every tool result gets appended to the message history before the next LLM iteration. Without truncation, a single browse_url call can inject 50KB of HTML content into the context window. Two web searches and a file read later, you have consumed 100K+ tokens of context on tool results alone, leaving no room for the actual conversation.

The truncation strategy is tool-specific:

pythondef _truncate_tool_result(name: str, result: dict) -> dict:
    _MAX_BROWSE = 8_000       # ~2,000 tokens
    _MAX_SEARCH_ITEM = 1_500  # ~375 tokens per result x 5 results max
    _MAX_RESULTS = 5          # cap Tavily results
    _MAX_BASH = 4_000         # ~1,000 tokens
    _MAX_FILE = 8_000         # ~2,000 tokens

    if name == "browse_url" and "content" in result:
        if len(result["content"]) > _MAX_BROWSE:
            result["content"] = result["content"][:_MAX_BROWSE] + "\n[...truncated]"

    elif name == "web_search" and "results" in result:
        result["results"] = result["results"][:_MAX_RESULTS]
        for r in result["results"]:
            if "content" in r and len(r["content"]) > _MAX_SEARCH_ITEM:
                r["content"] = r["content"][:_MAX_SEARCH_ITEM] + "..."

    elif name == "bash_execute" and "stdout" in result:
        if len(result["stdout"]) > _MAX_BASH:
            result["stdout"] = result["stdout"][:_MAX_BASH] + "\n[...truncated]"
    # ... similar for read_user_file

These limits were tuned empirically. 8KB of browsed content is enough for the LLM to understand a web page's structure and extract relevant information. 1.5KB per search result is enough for a summary and key facts. 4KB of bash output is enough for command results without dumping entire log files into the context.

Background Jobs: When 180 Seconds Is Not Enough

Some tool chains take longer than 180 seconds. A professional asks Deblo to research a SYSCOHADA topic, compile findings into a 20-page PDF report, generate supporting Excel tables, and email everything. That might involve 4-5 web searches, 3 file generations, and an email send -- easily 5-10 minutes of wall clock time.

The direct SSE stream has a 180-second timeout. Beyond that, browsers and reverse proxies start closing connections. So Deblo has a background generation system.

When the frontend sends background: true in the chat request, the backend creates a GenerationJob row in the database, spawns a detached asyncio.Task, and immediately returns a job ID. The task runs the same agentic loop but writes progress events to Redis instead of an SSE stream:

python# In background_generation.py
async def run_background_generation(job_id: UUID, ...):
    redis = Redis(connection_pool=redis_pool)
    try:
        # Publish progress events to Redis
        async def publish_progress(event_type: str, data: dict):
            await redis.publish(
                f"job:{job_id}:progress",
                json.dumps({"type": event_type, **data})
            )

        # Run the same agentic loop with progress callbacks
        async for event in stream_chat_response(...):
            if isinstance(event, dict):
                await publish_progress(event["type"], event.get("data", {}))
            # ... handle text content, tool events

        # Mark job complete
        job.status = "completed"
        job.result_messages = full_messages
        await db.commit()
    except Exception as e:
        job.status = "failed"
        job.error = str(e)
        await db.commit()

The frontend polls the job status every 2 seconds and displays a progress timeline showing which tools are running, which have completed, and what the current step is. The SSE-like event types (tool_start, tool_progress, tool_end) are reused in the Redis pubsub channel, so the frontend uses the same rendering logic for both direct and background generation.

Background jobs have a 30-minute timeout -- 10x the direct streaming limit. This is enough for the most complex professional tasks we have seen in production.

The Tool Executor: A Single Dispatch Function

All 24 tools are dispatched through a single function. No tool registry pattern. No plugin architecture. Just a function with a long if/elif chain:

pythonasync def execute_tool(
    func_name: str,
    func_args: dict,
    tool_call_id: str,
    *,
    db: AsyncSession,
    redis: Redis,
    user,
    conversation,
    effective_mode: str,
) -> dict:
    if func_name == "report_exercise_result" and user:
        er = ExerciseResult(
            user_id=user.id,
            conversation_id=conversation.id,
            subject=func_args.get("subject", ""),
            correct=bool(func_args.get("correct", False)),
            # ...
        )
        db.add(er)
        return {"success": True}

    if func_name == "award_bonus_credits" and user:
        credits = min(max(int(func_args.get("credits", 1)), 1), 5)
        user.credit_balance += credits
        await log_credit_event(user, "credit", credits, "ai_bonus", ...)
        return {"success": True, "credits_awarded": credits}

    if func_name in ("interactive_quiz", "true_false_quiz"):
        return await store_quiz(redis, conversation.id, ...)

    if func_name == "web_search":
        return await tavily_search(func_args.get("query", ""))

    if func_name == "bash_execute":
        return await sandbox_execute(func_args.get("command", ""))

    # ... 19 more tools

This is not elegant. It is readable. When a tool fails in production, we open this file, find the if block, and read exactly what happens. No indirection. No dependency injection. No abstract base class hierarchy to navigate.

The function was extracted from chat.py into tool_executor.py for one reason: the background generation service needs to call the same tools. Before extraction, the tool dispatch was inline in the SSE streaming endpoint. After extraction, both the streaming path and the background path call the same execute_tool function with the same signature.

SSE Events for Tool Progress

The frontend needs to show users what the AI is doing during tool execution. "Searching the web..." "Generating Excel file..." "Sending email..." This feedback comes through SSE events in the chat stream:

event: tool_start
data: {"name": "web_search", "id": "call_xyz", "detail": "SYSCOHADA bilan"}

event: content
data: {"text": "I found relevant information about SYSCOHADA..."}

event: tool_start
data: {"name": "generate_xlsx", "id": "call_abc", "detail": "Bilan comptable"}

event: tool_progress
data: {"name": "generate_xlsx", "delta": "{\"filename\": \"bilan_syscohada\", ..."}

event: file
data: {"id": "file-uuid", "filename": "bilan_syscohada.xlsx", "url": "/api/files/..."}

event: tool_end
data: {"name": "generate_xlsx", "id": "call_abc"}

The tool_progress event is special. For file generation tools, the LLM streams the JSON arguments chunk by chunk (because the file content is embedded in the tool call arguments). The streaming function intercepts these chunks and forwards them as tool_progress events. The frontend uses these to show a live preview of the file being generated -- the user sees the spreadsheet headers appearing in real time before the file is fully created.

Child vs. Pro: Two Tool Sets, One Loop

The tool selection depends on the conversation mode. The child mode gets:

pythonALL_TOOLS_CHILD = [
    REPORT_EXERCISE_RESULT_TOOL,
    BONUS_CREDITS_TOOL,
    INTERACTIVE_QUIZ_TOOL,
    TRUE_FALSE_QUIZ_TOOL,
    BUY_CREDITS_TOOL,
    BROWSE_URL_TOOL,
    WEB_SEARCH_TOOL,
    GENERATE_XLSX_TOOL, GENERATE_PDF_TOOL, GENERATE_PPTX_TOOL,
    GENERATE_DOCX_TOOL, GENERATE_HTML_TOOL, GENERATE_MD_TOOL,
    CREATE_TASK_TOOL,
    SEND_EMAIL_TOOL, SEND_SMS_TOOL, SEND_WHATSAPP_TOOL,
    REPORT_BUG_TOOL,
    *FILE_MEMORY_TOOLS,
]

The pro mode adds BASH_EXECUTE_TOOL and DRAFT_EMAIL_TOOL, while removing BONUS_CREDITS_TOOL (professionals do not earn bonus credits for correct answers). The bash_execute tool is restricted to pro mode because giving a sandboxed shell to children introduces risk without pedagogical value. The draft_email tool is pro-only because it allows sending emails to arbitrary recipients -- a capability that requires authentication (no guest access) and professional context.

The same agentic loop handles both modes. The only difference is the tools list passed to stream_chat_response. The LLM sees different tools and adapts its behavior accordingly. A student asking about fractions will see quiz tools used. A professional asking about SYSCOHADA entries will see file generation and web search used.

What We Learned

Building an agentic system taught us three things that no tutorial prepared us for:

First, truncation is more important than generation. The LLM's ability to generate useful output depends entirely on the quality of its context. Dumping raw tool results into the context poisons everything that follows. The truncation limits in _truncate_tool_result were tuned over dozens of sessions, each time debugging cases where the LLM "forgot" the original question because the context was flooded with web search noise.

Second, the last iteration must be forced to text. Without tool_choice: "none" on the final iteration, the LLM sometimes enters a tool-calling loop where it calls a tool, gets a result, decides it needs another tool, calls that, and so on until it hits the ceiling -- and then returns nothing because it wanted to call another tool but was not allowed to. Forcing text on the last iteration guarantees the user always gets a response.

Third, 60 seconds per tool is generous. Most tools complete in under 5 seconds. The 60-second timeout exists for web browsing (some pages are slow) and bash execution (some commands take time). But the timeout also serves as a safety valve against hanging tool calls. In production, we have seen exactly two cases where a tool hit the 60-second limit: a Tavily search during a Tavily outage, and a file generation that triggered an out-of-memory error in the PDF library. Both were caught by the timeout instead of blocking the entire stream indefinitely.

This is Part 3 of a 12-part series on building Deblo.ai.

AI Tutoring for 250 Million African Students
100 Sessions Later: The Architecture of an AI Education Platform
The Agentic Loop: 24 AI Tools in a Single Chat (you are here)
System Prompts That Teach: Anti-Cheating, Socratic Method, and Grade-Level Adaptation
WhatsApp OTP and the African Authentication Problem
Credits, FCFA, and 6 African Payment Gateways
SSE Streaming: Real-Time AI Responses in SvelteKit
Voice Calls With AI: Ultravox, LiveKit, and WebRTC
Building a React Native K12 App in 7 Days
101 AI Advisors: Professional Intelligence for Africa
Background Jobs: When AI Takes 30 Minutes to Think
From Abidjan to 250 Million: The Deblo.ai Story

The Agentic Loop: 24 AI Tools in a Single Chat

The Loop

The 24 Tools

The Interactive Quiz Tool

Tool Result Truncation

Background Jobs: When 180 Seconds Is Not Enough

The Tool Executor: A Single Dispatch Function

SSE Events for Tool Progress

Child vs. Pro: Two Tool Sets, One Loop

What We Learned

Responses

Related Articles

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

The gate caught its own drift: one day inside CASP with Claude Fable 5

The CASP Transplant: How The Six-File Discipline Moved From Conductor To An Anti-Fraud Transport ERP, What The /next Skill Adds When The Operator Just Types 'next', And Why The Cost Of CASP Drift Rises When The Project Is Someone Else's Cash