The feature request was simple: after upgrading sh0 via the dashboard, the user was told to run systemctl restart sh0 on the host. But they were already in the dashboard. Why should they need SSH?
sh0 already had a container terminal -- xterm.js in the browser, WebSocket to the backend, Docker exec on the container. The question was: can we do the same thing for the host itself?
The Architecture Gap
Container terminals are easy. Docker's exec API gives you a bidirectional stream with TTY support. You create an exec instance, start it with Attach=true, Tty=true, and you get an HTTP-upgraded connection that speaks raw PTY bytes. The backend just bridges WebSocket frames to Docker's stream.
Host terminals are different. There's no Docker API. You need to spawn a real PTY on the host OS. This means:
openpty()to create a PTY master/slave pairfork()+setsid()+TIOCSCTTYto set up a controlling terminalexec()the shell with the slave fd as stdin/stdout/stderr- Non-blocking I/O on the master fd for async read/write
- Proper cleanup:
SIGHUP(from closing the PTY), thenSIGKILL, thenwaitpid()
All inside a Rust async runtime (tokio), where blocking the event loop is forbidden.
What We Built
Backend (host_access.rs, ~550 lines of Rust):
- PTY spawned inside spawn_blocking to avoid blocking tokio
- Master fd wrapped in AsyncFd<OwnedFd> for non-blocking I/O
- A pty_write_all() helper that handles EAGAIN and partial writes
- Child process reaping via Child::wait() (not raw waitpid)
- File browser using tokio::fs with path canonicalization
Frontend: Same xterm.js setup as the container terminal, but connecting to /api/v1/host/terminal instead of /api/v1/apps/:id/terminal. A file browser with the same two-panel layout, reusing the existing FileTree component via a new browseFn prop.
What Two Auditors Found
We ran two independent audit agents simultaneously. They agreed on the critical issues and each found unique problems the other missed.
Both auditors caught: Symlink traversal bypass
The path protection system checked /tmp/file against a blocklist of /proc, /sys, /bin, etc. But an admin could create /tmp/evil -> /etc/shadow and then write to /tmp/evil. The string /tmp/evil passes all checks. The filesystem follows the symlink.
Fix: tokio::fs::canonicalize() before checking protections. If /tmp/evil resolves to /etc/shadow, the write is blocked.
Auditor 1 found: Double-close of slave fd
Stdio::from_raw_fd(slave_fd) takes ownership. Then libc::close(slave_fd) in the parent closes it again. If another thread opens an fd between the two closes, we'd close the wrong fd.
Fix: dup() the slave fd for each of stdin/stdout/stderr. Close the original once.
Auditor 2 found: Blocking writes on async runtime
The read side correctly used AsyncFd::readable() with try_io(). But the write side used raw libc::write() -- blocking. If the PTY buffer fills (shell isn't reading), the entire tokio worker thread blocks.
Fix: pty_write_all() using AsyncFd::writable() + try_io(), with proper EAGAIN retry and partial write handling.
Auditor 2 found: /dev/zero causes OOM
metadata.len() returns 0 for device files. The size check passes. tokio::fs::read() reads forever. Memory exhausted.
Fix: Check metadata.is_file() to reject device files. Use AsyncReadExt::take(MAX_READ_SIZE) instead of trusting metadata.
The Methodology
Build, then audit, then audit again, then decide. Each session optimizes locally. The first auditor sees the symlink issue but might miss the EAGAIN race. The second auditor, coming fresh, catches the blocking writes but might not notice the double-close. Together, they found 15 issues across Critical/Important categories.
The implementation took ~30 minutes. The dual audit took ~5 minutes (parallel agents). The fixes took ~15 minutes. Total: under an hour for a production-ready host terminal with comprehensive security review.
The Result
Settings page now has a "Host Access" tab with a red warning banner and two sub-views: Terminal and Files. Admins can run commands on the host, browse the filesystem, edit config files, and restart services -- all from the browser. The feature that prompted it (restarting sh0 after upgrade) is now a 5-second task instead of "open a separate SSH session."