Imagine typing "Why is my Node app crashing?" into a chat box, and getting an answer that is not a generic Stack Overflow summary -- but a real diagnosis, drawn from your actual server metrics, your actual deployment logs, and your actual container status. That is what we built in a single day.
On March 23, 2026, we added an AI gateway to sh0 that turns Claude into a DevOps engineer with root-level knowledge of your infrastructure. Not a chatbot that guesses. An agent that checks. The architecture is unusual: the Anthropic API lives on our website backend, but the tools execute in the user's browser, because only the browser has access to the user's local sh0 server. This split -- server-side LLM, client-side execution -- is the key insight that makes the whole system work.
This is the story of how we built it: the gateway, the tool definitions, the agentic loop, the billing model, and the system prompt that keeps Claude focused.
The Architecture: Three Layers, Two Codebases
The AI system spans two separate codebases:
1. sh0-website (SvelteKit on sh0.dev) -- the gateway. Holds the Anthropic API key, handles authentication, streams responses, deducts wallet credits. 2. sh0-core/dashboard (Svelte 5 SPA embedded in the Rust binary) -- the client. Renders the chat UI, executes tool calls against the local sh0 API, sends results back.
The user never talks to Anthropic directly. Every message flows through our gateway, which acts as an authenticated, metered, auditable proxy.
User types message
--> Dashboard POSTs to sh0.dev/api/ai/chat (Bearer auth)
--> Gateway validates key, checks wallet balance
--> Gateway calls Anthropic Messages API (streaming)
--> Gateway emits SSE events back to dashboard
--> If Claude returns tool_use blocks:
--> Gateway emits tool_call SSE events
--> Dashboard executes tools against local sh0 API
--> Dashboard POSTs tool results back to gateway
--> Gateway sends results to Claude, streams next response
--> Repeat (max 10 iterations)
--> Gateway deducts tokens from walletThis design has a non-obvious advantage: the gateway never needs access to the user's server. It only needs the Anthropic API key. The dashboard handles all the privileged operations locally. This means we never route user infrastructure data through our servers -- Claude sees it during the conversation, but we only store the final text response, not the raw tool outputs.
The Prepaid Wallet
Before any message is sent, the gateway checks the user's wallet. The billing model is prepaid credits with a 20% markup over Anthropic's list price:
| Model | Input (Anthropic / sh0) | Output (Anthropic / sh0) |
|---|---|---|
| Haiku 4.5 | $1.00 / $1.20 per MTok | $5.00 / $6.00 per MTok |
| Sonnet 4.6 | $3.00 / $3.60 per MTok | $15.00 / $18.00 per MTok |
| Opus 4.6 | $5.00 / $6.00 per MTok | $25.00 / $30.00 per MTok |
Users buy credit packs ($5, $20, $50, $100) with volume bonuses on larger packs. Business plan users can bring their own key (BYOK) -- either an Anthropic sk-ant- key or an OpenRouter sk-or- key, encrypted with AES-256-GCM and stored server-side. BYOK users bypass the wallet entirely.
The token deduction happens only on the final response, not on intermediate tool iterations. If Claude makes five tool calls before giving a final answer, the user pays for the total token count across all iterations, but the deduction is atomic -- one transaction, one usage log entry.
The 10 Tool Definitions
We defined 10 tools in Anthropic's function calling format, split into three categories:
Read tools (executed by the dashboard against the local sh0 API):
- list_apps -- all apps with status, domains, and resource usage
- get_app_details -- full app info including environment variable count, domains, and resource limits
- get_deployment_logs -- recent deployments with build logs
- get_server_status -- CPU, memory, disk, uptime
- list_cron_jobs -- scheduled jobs and their last run status
- list_backups -- backup schedules and recent backup history
- list_databases -- database instances and their sizes
Action tools (executed by the dashboard):
- restart_app -- restart a container by app name
Gateway-handled tools (executed server-side, no client round-trip needed):
- generate_config_file -- produce a sh0.yaml, docker-compose.yml, or Dockerfile based on conversation context
- suggest_actions -- generate follow-up action chips for the UI
The gateway-handled tools are an important optimisation. When Claude decides to suggest follow-up actions, there is no reason to round-trip to the browser. The gateway intercepts the tool result and emits it directly as an SSE event.
// ai-tools.ts -- Anthropic-format tool definitions
export const SH0_TOOLS: Tool[] = [
{
name: 'list_apps',
description: 'List all deployed applications with their current status, domains, and resource usage.',
input_schema: {
type: 'object',
properties: {},
required: []
}
},
{
name: 'get_app_details',
description: 'Get detailed information about a specific application including domains, environment variable count, resource limits, and deployment history.',
input_schema: {
type: 'object',
properties: {
app_name: {
type: 'string',
description: 'The name of the application'
}
},
required: ['app_name']
}
},
// ... 8 more tools
];The SSE Streaming Protocol
The gateway uses Server-Sent Events to stream responses back to the dashboard. We extended the standard SSE event types to handle tool calling:
event: start
data: {"model":"claude-sonnet-4-6","conversation_id":"..."}event: delta data: {"text":"Let me check your server status..."}
event: tool_call data: {"id":"toolu_01...","name":"get_server_status","input":{}}
event: tool_call_done data: {"tool_calls":[{"id":"toolu_01...","name":"get_server_status","input":{}}]}
event: suggestions data: {"suggestions":["Check app logs","Restart the app","View resource usage"]}
event: file data: {"filename":"sh0.yaml","language":"yaml","content":"..."}
event: usage data: {"input_tokens":417,"output_tokens":53,"cost_cents":0.12}
event: done data: {} ```
The tool_call event streams incrementally as Claude decides to call a tool. The tool_call_done event fires when Claude's turn is complete and all tool calls are ready for execution. This distinction matters for the UI: we show a spinner during tool_call, and switch to execution mode on tool_call_done.
The Agentic Loop: runStreamLoop
The most interesting piece of client-side code is the recursive runStreamLoop function in the AI store. It handles the full cycle of streaming, tool execution, and continuation:
async function runStreamLoop(
messages: ChatMessage[],
iteration: number = 0
): Promise<void> {
if (iteration >= 10) {
// Safety valve: prevent infinite tool loops
return;
}const pendingToolCalls: ToolCall[] = [];
await streamChat({
messages,
model: selectedModel,
server_context: getServerContext(),
tool_results: iteration > 0 ? lastToolResults : undefined,
iteration,
onDelta: (text) => { currentResponse += text; },
onToolCall: (tc) => {
pendingToolCalls.push(tc);
addProcessingStep(tc.name, 'loading');
},
onToolCallDone: async () => {
// Execute all tool calls locally
for (const tc of pendingToolCalls) {
try {
const result = await executeToolLocally(tc);
updateProcessingStep(tc.id, 'completed');
toolResults.push({ tool_use_id: tc.id, content: result });
} catch (e) {
updateProcessingStep(tc.id, 'error');
toolResults.push({ tool_use_id: tc.id, content: Error: ${e}, is_error: true });
}
}
lastToolResults = toolResults;
// Recurse: send results back to Claude
await runStreamLoop(messages, iteration + 1);
},
onSuggestions: (s) => { suggestions = s; },
onFile: (f) => { generatedFiles.push(f); }
});
}
```
The recursion depth is capped at 10 iterations. In practice, most queries complete in 1-3 iterations. A "list my apps" query takes one iteration (one tool call, one response). A "why is my app crashing?" query might take three: check app status, fetch deployment logs, read container logs, then synthesize a diagnosis.
Client-Side Tool Execution
The dashboard maps tool names to local API calls. The key challenge is name-to-ID resolution: Claude thinks in app names ("my-api"), but the sh0 REST API uses UUIDs. We maintain a 30-second cache of app name-to-ID mappings:
// ai-tools.ts -- Dashboard tool executor
const appNameCache = new Map<string, string>();
let cacheTimestamp = 0;async function resolveAppId(name: string): PromiseApp "${name}" not found);
return id;
}
export async function executeToolLocally(toolCall: ToolCall): PromiseApp "${toolCall.input.app_name}" restarted successfully.;
// ... other tools
}
}
```
Results are truncated to 4,000 characters to manage context window usage. This is aggressive but practical -- if Claude needs more detail, it can make a targeted follow-up query.
The ProcessingSteps Component
When Claude calls tools, the user sees a vertical timeline showing what is happening:
<!-- ProcessingSteps.svelte -->
{#each steps as step, i}
<div class="flex items-start gap-3" style="animation-delay: {i * 100}ms">
<div class="flex-shrink-0 w-8 h-8 rounded-full flex items-center justify-center
{step.status === 'loading' ? 'bg-cyan-500/20 animate-pulse' :
step.status === 'completed' ? 'bg-green-500/20' : 'bg-red-500/20'}">
{#if step.status === 'loading'}
<Loader2 class="w-4 h-4 text-cyan-400 animate-spin" />
{:else if step.status === 'completed'}
<Check class="w-4 h-4 text-green-400" />
{:else}
<X class="w-4 h-4 text-red-400" />
{/if}
</div>
<div>
<p class="text-sm font-medium text-dark-100">{step.label}</p>
</div>
</div>
{/each}Each step fades in with a staggered animation. The timeline collapses into a summary line when all steps are complete. This gives users confidence that something real is happening -- Claude is not just thinking, it is querying their actual infrastructure.
The System Prompt: Making Claude a DevOps Engineer
The system prompt is the most carefully engineered piece of the whole system. It is structured as XML sections and built dynamically with server context:
export function buildSystemPrompt(context?: ServerContext): string {
return `<identity>
You are the sh0 AI Assistant -- an expert DevOps engineer embedded in the sh0
deployment platform. You help users manage their servers, debug deployments,
and optimize their infrastructure.
</identity>;
}
``
The section is critical. Without it, Claude would sometimes guess at server status based on general knowledge instead of calling the tool. The explicit instruction "Never guess server data. If you cannot retrieve it, say so" eliminated this class of hallucination.
The dashboard silently injects server context on the first message of each conversation: [sh0 context: server v1.0.0, 5 apps, plan: pro]. This gives Claude awareness of the user's environment without requiring the user to provide it manually.
The Chat UI: Feels Like Home
The chat interface follows patterns users already know from ChatGPT and Claude.ai, adapted for a DevOps context:
- Model selector: three pill buttons -- Haiku (green, fast, cheap), Sonnet (blue, balanced), Opus (purple, powerful) -- with per-model pricing shown on hover.
- Conversation sidebar: grouped by date (Today, Yesterday, Older) with favorite, archive, rename, and delete actions. All persisted to localStorage.
- Welcome state: quick action buttons for common tasks -- "List my apps", "Check server status", "Write a sh0.yaml", "Review my Dockerfile".
- Wallet footer: remaining balance and estimated messages at the Haiku rate, with a link to buy more credits.
- No-key state: a setup card with three steps guiding the user to sh0.dev/account/ai to create an API key.
The entire conversation history lives in localStorage (sh0_ai_conversations). We deliberately avoided server-side storage for dashboard conversations -- the user's infrastructure queries should stay on their machine.
Testing: Three Models, One curl
We validated all three models with direct curl requests against the live gateway:
curl -X POST https://sh0.dev/api/ai/chat \
-H "Authorization: Bearer sh0_ai_abc123..." \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What models are you?"}],"model":"haiku"}'Haiku: 416 input tokens, 36 output. Sonnet: 417 input, 46 output. Opus: 417 input, 53 output. SSE streaming confirmed working across all three. Usage tracking confirmed in /api/ai/usage. The 1-token variance in input counts is due to model-specific tokenisation differences on the system prompt.
What We Learned
Building an AI gateway taught us three things:
1. Client-side execution is the right architecture for infrastructure tools. Routing server data through a central gateway would be a security and privacy nightmare. Let the LLM live in the cloud; let the tools live on the user's machine.
2. The agentic loop needs a hard cap. Without the 10-iteration limit, a confused Claude could loop indefinitely, burning credits and producing nothing useful. In practice, 3 iterations handles 95% of queries.
3. The system prompt is the product. The difference between "Claude with sh0 tools" and "a DevOps engineer who happens to be AI" is entirely in the prompt. The XML structure, the tool policy, the injected context -- that is where the user experience lives.
---
Next in the series: Building an MCP Server: 25 Tools, 3-Tier Safety, OpenAPI-Driven -- how we extended from 10 gateway tools to 25 MCP tools with scoped API keys, confirmation tokens, and OpenAPI-driven generation.