20 Bugs, One Session: How We QA'd sh0 v1.6.0 with AI

sh0 v1.6.0 shipped with every planned feature complete. 230+ API endpoints, 103 MCP tools, 170 deploy templates, mail servers, auth, realtime, functions -- the whole platform. Then the CEO sat down to actually use it. What happened next is the most productive QA session in sh0's history.

The Setup

The CEO tested sh0 as a developer would: deploy stacks, create mail domains, set up cron jobs, provision auth servers. Every bug was reported in real-time via screenshot + error log, and fixed immediately in the same conversation.

No staging environment. No Jira tickets. No sprint planning. Just: "this is broken" -> investigate -> fix -> "next issue."

The Bugs (and What They Reveal)

The String Mismatch ($0.05 bug, $500 impact)

The SSL badge on the settings page showed "SSL pending" even though HTTPS was fully working. Root cause: Caddy's API returned "managed" but the dashboard checked for "active". One string comparison. One character difference in behavior.

This is the kind of bug that makes users distrust a product. If the SSL indicator is wrong, what else is wrong? A one-line fix that preserves credibility.

The Infinite Loop (Svelte 5 reactivity trap)

The mail Queue tab caused an infinite browser refresh. The culprit: a $state variable holding a setInterval handle, read and written inside a $effect. Svelte 5's fine-grained reactivity means every read creates a dependency, every write triggers re-evaluation. The timer handle doesn't need reactivity -- it's an implementation detail, not UI state.

Lesson: In Svelte 5, not everything should be $state. Timer handles, WebSocket references, and internal bookkeeping should be plain let.

The Ghost Image (Docker registry archaeology)

The Auth server (Logto) failed to create because the Docker image logto/logto:1.22.0 doesn't exist. Never did. The correct image is svhd/logto -- published by Silverhand, the company behind Logto, using their old company abbreviation as the Docker Hub namespace.

But even after fixing the image name, the container crash-looped with npm error Missing script: "node". The svhd/logto image uses ENTRYPOINT ["npm", "run"] with CMD ["start"]. Our override CMD ["node", ".", "--env", "production"] got appended to the entrypoint, resulting in npm run node . --env production -- and there's no npm script called "node."

The fix: override the entrypoint entirely with the official docker-compose pattern: sh -c "npm run cli db seed -- --swe && npm start".

Lesson: When integrating third-party Docker images, always check the actual Dockerfile, not documentation that might reference a different version.

The cPanel Muscle Memory

A developer tried to create a cron job with: curl -s "https://api.example.com/cron" > /dev/null 2>&1

This is standard crontab syntax that every Linux admin has written hundreds of times. But sh0 executes curl commands natively (no shell), so the > and & characters were rejected by the command validator.

Instead of telling users to change their habits, we added sanitize_command() that strips shell redirections before validation. Paste your crontab line as-is; sh0 handles the rest.

The Silent Admin Password

Stalwart mail server auto-generates an admin password on first boot and prints it to the container logs. Nobody reads container logs. Users created mailboxes, then couldn't log into the webmail because they never knew the admin password existed.

The fix: decrypt the stored admin password and display it on the mail Overview tab with a show/hide toggle. The password was always there in the database -- it just wasn't surfaced.

The Method

Every fix followed the same pattern:

CEO reports -- screenshot + error log, no interpretation
AI investigates -- reads source code, traces the issue to root cause
AI fixes -- minimal change, no scope creep
CEO continues testing -- reports the next issue

No context switching. No ticket grooming. No "we'll get to it next sprint." The feedback loop was minutes, not days.

After all fixes, an audit agent reviewed every change for correctness, consistency, and regressions. Found 3 more issues (a missing struct field that would break compilation, a timer leak, and a pattern ordering bug). All fixed before release.

The Numbers

Metric	Value
Bugs found	20+
Services affected	Mail, Auth, Cron, Realtime, Functions, Settings
Files changed	72
Lines changed	+4,856 / -3,508
Tests passing	155/155
Time to fix all	~4 hours
Version	v1.6.0 -> v1.6.1

What This Means

Traditional QA for a platform this size would take a team of testers a week to find these bugs, and a team of developers another week to fix them. We did it in one sitting because the AI holds the entire codebase in context -- every Rust handler, every Svelte component, every i18n key across 5 languages.

The AI doesn't get tired of reading error logs. It doesn't forget what the mail handler does when it switches to fixing the cron scheduler. It doesn't need to re-learn the project structure for each bug.

This is what "AI as CTO" looks like in practice: not replacing human judgment about what to test, but amplifying the speed at which issues are found, diagnosed, and fixed. The CEO's time is the bottleneck. Every minute spent waiting for a build or searching for a bug is a minute not spent deciding what the product should be.

v1.6.1 is cleaner than v1.6.0. Not because we planned it that way, but because we tested it like users would.

20 Bugs, One Session: How We QA'd sh0 v1.6.0 with AI

The Setup

The Bugs (and What They Reveal)

The String Mismatch ($0.05 bug, $500 impact)

The Infinite Loop (Svelte 5 reactivity trap)

The Ghost Image (Docker registry archaeology)

The cPanel Muscle Memory

The Silent Admin Password

The Method

The Numbers

What This Means

Responses

Related Articles

A Browser Terminal to Your Host Server: PTY, Symlink Attacks, and Zombie Processes

Building a Serverless File Manager: How Dual Audits Caught a Path Namespace Bug Before It Shipped

How We Stopped Wasting 4 Hours Per Day on Build Commands (And Built a Verification Architecture Instead)