Giving sh0 a Brain: AI Gateway with Claude Tool Calling

Imagine typing "Why is my Node app crashing?" into a chat box, and getting an answer that is not a generic Stack Overflow summary -- but a real diagnosis, drawn from your actual server metrics, your actual deployment logs, and your actual container status. That is what we built in a single day.

On March 23, 2026, we added an AI gateway to sh0 that turns Claude into a DevOps engineer with root-level knowledge of your infrastructure. Not a chatbot that guesses. An agent that checks. The architecture is unusual: the Anthropic API lives on our website backend, but the tools execute in the user's browser, because only the browser has access to the user's local sh0 server. This split -- server-side LLM, client-side execution -- is the key insight that makes the whole system work.

This is the story of how we built it: the gateway, the tool definitions, the agentic loop, the billing model, and the system prompt that keeps Claude focused.

The Architecture: Three Layers, Two Codebases

The AI system spans two separate codebases:

sh0-website (SvelteKit on sh0.dev) -- the gateway. Holds the Anthropic API key, handles authentication, streams responses, deducts wallet credits.
sh0-core/dashboard (Svelte 5 SPA embedded in the Rust binary) -- the client. Renders the chat UI, executes tool calls against the local sh0 API, sends results back.

The user never talks to Anthropic directly. Every message flows through our gateway, which acts as an authenticated, metered, auditable proxy.

User types message
    --> Dashboard POSTs to sh0.dev/api/ai/chat (Bearer auth)
    --> Gateway validates key, checks wallet balance
    --> Gateway calls Anthropic Messages API (streaming)
    --> Gateway emits SSE events back to dashboard
    --> If Claude returns tool_use blocks:
        --> Gateway emits tool_call SSE events
        --> Dashboard executes tools against local sh0 API
        --> Dashboard POSTs tool results back to gateway
        --> Gateway sends results to Claude, streams next response
    --> Repeat (max 10 iterations)
    --> Gateway deducts tokens from wallet

This design has a non-obvious advantage: the gateway never needs access to the user's server. It only needs the Anthropic API key. The dashboard handles all the privileged operations locally. This means we never route user infrastructure data through our servers -- Claude sees it during the conversation, but we only store the final text response, not the raw tool outputs.

The Prepaid Wallet

Before any message is sent, the gateway checks the user's wallet. The billing model is prepaid credits with a 20% markup over Anthropic's list price:

Model	Input (Anthropic / sh0)	Output (Anthropic / sh0)
Haiku 4.5	$1.00 / $1.20 per MTok	$5.00 / $6.00 per MTok
Sonnet 4.6	$3.00 / $3.60 per MTok	$15.00 / $18.00 per MTok
Opus 4.6	$5.00 / $6.00 per MTok	$25.00 / $30.00 per MTok

Users buy credit packs ($5, $20, $50, $100) with volume bonuses on larger packs. Business plan users can bring their own key (BYOK) -- either an Anthropic sk-ant- key or an OpenRouter sk-or- key, encrypted with AES-256-GCM and stored server-side. BYOK users bypass the wallet entirely.

The token deduction happens only on the final response, not on intermediate tool iterations. If Claude makes five tool calls before giving a final answer, the user pays for the total token count across all iterations, but the deduction is atomic -- one transaction, one usage log entry.

The 10 Tool Definitions

We defined 10 tools in Anthropic's function calling format, split into three categories:

Read tools (executed by the dashboard against the local sh0 API): - list_apps -- all apps with status, domains, and resource usage - get_app_details -- full app info including environment variable count, domains, and resource limits - get_deployment_logs -- recent deployments with build logs - get_server_status -- CPU, memory, disk, uptime - list_cron_jobs -- scheduled jobs and their last run status - list_backups -- backup schedules and recent backup history - list_databases -- database instances and their sizes

Action tools (executed by the dashboard): - restart_app -- restart a container by app name

Gateway-handled tools (executed server-side, no client round-trip needed): - generate_config_file -- produce a sh0.yaml, docker-compose.yml, or Dockerfile based on conversation context - suggest_actions -- generate follow-up action chips for the UI

The gateway-handled tools are an important optimisation. When Claude decides to suggest follow-up actions, there is no reason to round-trip to the browser. The gateway intercepts the tool result and emits it directly as an SSE event.

typescript// ai-tools.ts -- Anthropic-format tool definitions
export const SH0_TOOLS: Tool[] = [
  {
    name: 'list_apps',
    description: 'List all deployed applications with their current status, domains, and resource usage.',
    input_schema: {
      type: 'object',
      properties: {},
      required: []
    }
  },
  {
    name: 'get_app_details',
    description: 'Get detailed information about a specific application including domains, environment variable count, resource limits, and deployment history.',
    input_schema: {
      type: 'object',
      properties: {
        app_name: {
          type: 'string',
          description: 'The name of the application'
        }
      },
      required: ['app_name']
    }
  },
  // ... 8 more tools
];

The SSE Streaming Protocol

The gateway uses Server-Sent Events to stream responses back to the dashboard. We extended the standard SSE event types to handle tool calling:

event: start
data: {"model":"claude-sonnet-4-6","conversation_id":"..."}

event: delta
data: {"text":"Let me check your server status..."}

event: tool_call
data: {"id":"toolu_01...","name":"get_server_status","input":{}}

event: tool_call_done
data: {"tool_calls":[{"id":"toolu_01...","name":"get_server_status","input":{}}]}

event: suggestions
data: {"suggestions":["Check app logs","Restart the app","View resource usage"]}

event: file
data: {"filename":"sh0.yaml","language":"yaml","content":"..."}

event: usage
data: {"input_tokens":417,"output_tokens":53,"cost_cents":0.12}

event: done
data: {}

The tool_call event streams incrementally as Claude decides to call a tool. The tool_call_done event fires when Claude's turn is complete and all tool calls are ready for execution. This distinction matters for the UI: we show a spinner during tool_call, and switch to execution mode on tool_call_done.

The Agentic Loop: runStreamLoop

The most interesting piece of client-side code is the recursive runStreamLoop function in the AI store. It handles the full cycle of streaming, tool execution, and continuation:

typescriptasync function runStreamLoop(
  messages: ChatMessage[],
  iteration: number = 0
): Promise<void> {
  if (iteration >= 10) {
    // Safety valve: prevent infinite tool loops
    return;
  }

  const pendingToolCalls: ToolCall[] = [];

  await streamChat({
    messages,
    model: selectedModel,
    server_context: getServerContext(),
    tool_results: iteration > 0 ? lastToolResults : undefined,
    iteration,
    onDelta: (text) => { currentResponse += text; },
    onToolCall: (tc) => {
      pendingToolCalls.push(tc);
      addProcessingStep(tc.name, 'loading');
    },
    onToolCallDone: async () => {
      // Execute all tool calls locally
      for (const tc of pendingToolCalls) {
        try {
          const result = await executeToolLocally(tc);
          updateProcessingStep(tc.id, 'completed');
          toolResults.push({ tool_use_id: tc.id, content: result });
        } catch (e) {
          updateProcessingStep(tc.id, 'error');
          toolResults.push({ tool_use_id: tc.id, content: `Error: ${e}`, is_error: true });
        }
      }
      lastToolResults = toolResults;
      // Recurse: send results back to Claude
      await runStreamLoop(messages, iteration + 1);
    },
    onSuggestions: (s) => { suggestions = s; },
    onFile: (f) => { generatedFiles.push(f); }
  });
}

The recursion depth is capped at 10 iterations. In practice, most queries complete in 1-3 iterations. A "list my apps" query takes one iteration (one tool call, one response). A "why is my app crashing?" query might take three: check app status, fetch deployment logs, read container logs, then synthesize a diagnosis.

Client-Side Tool Execution

The dashboard maps tool names to local API calls. The key challenge is name-to-ID resolution: Claude thinks in app names ("my-api"), but the sh0 REST API uses UUIDs. We maintain a 30-second cache of app name-to-ID mappings:

typescript// ai-tools.ts -- Dashboard tool executor
const appNameCache = new Map<string, string>();
let cacheTimestamp = 0;

async function resolveAppId(name: string): Promise<string> {
  if (Date.now() - cacheTimestamp > 30000) {
    const apps = await appsApi.list();
    appNameCache.clear();
    for (const app of apps) {
      appNameCache.set(app.name, app.id);
    }
    cacheTimestamp = Date.now();
  }
  const id = appNameCache.get(name);
  if (!id) throw new Error(`App "${name}" not found`);
  return id;
}

export async function executeToolLocally(toolCall: ToolCall): Promise<string> {
  switch (toolCall.name) {
    case 'list_apps':
      const apps = await appsApi.list();
      return JSON.stringify(apps.map(a => ({
        name: a.name, status: a.status, domains: a.domains
      }))).slice(0, 4000);
    case 'get_server_status':
      return JSON.stringify(await statusApi.get()).slice(0, 4000);
    case 'restart_app':
      const id = await resolveAppId(toolCall.input.app_name);
      await appsApi.restart(id);
      return `App "${toolCall.input.app_name}" restarted successfully.`;
    // ... other tools
  }
}

Results are truncated to 4,000 characters to manage context window usage. This is aggressive but practical -- if Claude needs more detail, it can make a targeted follow-up query.

The ProcessingSteps Component

When Claude calls tools, the user sees a vertical timeline showing what is happening:

svelte<!-- ProcessingSteps.svelte -->
{#each steps as step, i}
  <div class="flex items-start gap-3" style="animation-delay: {i * 100}ms">
    <div class="flex-shrink-0 w-8 h-8 rounded-full flex items-center justify-center
      {step.status === 'loading' ? 'bg-cyan-500/20 animate-pulse' :
       step.status === 'completed' ? 'bg-green-500/20' : 'bg-red-500/20'}">
      {#if step.status === 'loading'}
        <Loader2 class="w-4 h-4 text-cyan-400 animate-spin" />
      {:else if step.status === 'completed'}
        <Check class="w-4 h-4 text-green-400" />
      {:else}
        <X class="w-4 h-4 text-red-400" />
      {/if}
    </div>
    <div>
      <p class="text-sm font-medium text-dark-100">{step.label}</p>
    </div>
  </div>
{/each}

Each step fades in with a staggered animation. The timeline collapses into a summary line when all steps are complete. This gives users confidence that something real is happening -- Claude is not just thinking, it is querying their actual infrastructure.

The System Prompt: Making Claude a DevOps Engineer

The system prompt is the most carefully engineered piece of the whole system. It is structured as XML sections and built dynamically with server context:

typescriptexport function buildSystemPrompt(context?: ServerContext): string {
  return `<identity>
You are the sh0 AI Assistant -- an expert DevOps engineer embedded in the sh0
deployment platform. You help users manage their servers, debug deployments,
and optimize their infrastructure.
</identity>

<capabilities>
You have access to tools that query the user's sh0 server in real time.
Always use tools to get current data. Never guess server state.
</capabilities>

<tool-policy>
- ALWAYS use list_apps before answering questions about specific apps
- ALWAYS use get_server_status before answering questions about resources
- If a tool returns an error, explain what went wrong in plain language
- Never fabricate server data. If you cannot retrieve it, say so.
</tool-policy>

<server-context>
sh0 version: ${context?.version ?? 'unknown'}
Apps deployed: ${context?.appCount ?? 0}
Plan: ${context?.plan ?? 'unknown'}
</server-context>

<response-format>
- Use markdown for formatting
- Use code blocks with language tags for commands and configs
- Be concise but thorough
- When suggesting actions, explain WHY, not just WHAT
</response-format>`;
}

The <tool-policy> section is critical. Without it, Claude would sometimes guess at server status based on general knowledge instead of calling the tool. The explicit instruction "Never guess server data. If you cannot retrieve it, say so" eliminated this class of hallucination.

The dashboard silently injects server context on the first message of each conversation: [sh0 context: server v1.0.0, 5 apps, plan: pro]. This gives Claude awareness of the user's environment without requiring the user to provide it manually.

The Chat UI: Feels Like Home

The chat interface follows patterns users already know from ChatGPT and Claude.ai, adapted for a DevOps context:

Model selector: three pill buttons -- Haiku (green, fast, cheap), Sonnet (blue, balanced), Opus (purple, powerful) -- with per-model pricing shown on hover.
Conversation sidebar: grouped by date (Today, Yesterday, Older) with favorite, archive, rename, and delete actions. All persisted to localStorage.
Welcome state: quick action buttons for common tasks -- "List my apps", "Check server status", "Write a sh0.yaml", "Review my Dockerfile".
Wallet footer: remaining balance and estimated messages at the Haiku rate, with a link to buy more credits.
No-key state: a setup card with three steps guiding the user to sh0.dev/account/ai to create an API key.

The entire conversation history lives in localStorage (sh0_ai_conversations). We deliberately avoided server-side storage for dashboard conversations -- the user's infrastructure queries should stay on their machine.

Testing: Three Models, One curl

We validated all three models with direct curl requests against the live gateway:

bashcurl -X POST https://sh0.dev/api/ai/chat \
  -H "Authorization: Bearer sh0_ai_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What models are you?"}],"model":"haiku"}'

Haiku: 416 input tokens, 36 output. Sonnet: 417 input, 46 output. Opus: 417 input, 53 output. SSE streaming confirmed working across all three. Usage tracking confirmed in /api/ai/usage. The 1-token variance in input counts is due to model-specific tokenisation differences on the system prompt.

What We Learned

Building an AI gateway taught us three things:

Client-side execution is the right architecture for infrastructure tools. Routing server data through a central gateway would be a security and privacy nightmare. Let the LLM live in the cloud; let the tools live on the user's machine.

The agentic loop needs a hard cap. Without the 10-iteration limit, a confused Claude could loop indefinitely, burning credits and producing nothing useful. In practice, 3 iterations handles 95% of queries.

The system prompt is the product. The difference between "Claude with sh0 tools" and "a DevOps engineer who happens to be AI" is entirely in the prompt. The XML structure, the tool policy, the injected context -- that is where the user experience lives.

Next in the series: Building an MCP Server: 25 Tools, 3-Tier Safety, OpenAPI-Driven -- how we extended from 10 gateway tools to 25 MCP tools with scoped API keys, confirmation tokens, and OpenAPI-driven generation.