Some features are best built alone. You focus on one problem, reason about its edge cases, write the code, write the tests, and ship it. Other features are best built in parallel -- when they touch entirely different domains and share zero files, running them concurrently cuts the calendar time in half without introducing merge conflicts.
Phases 20 and 21 of sh0's development were a parallel build. The export system generated deployment configurations for seven platforms (Vercel, AWS, GCP, Kubernetes, Railway, Render, Docker Compose). The cron scheduler executed recurring jobs inside application containers with timeout enforcement. They shared no code, no database tables, no API routes. We ran them as parallel agent teams in isolated git worktrees.
This article focuses on the cron scheduler and preview environments -- two features that gave sh0 users the tools they needed to automate recurring tasks and test pull requests before merging.
The Cron Scheduler
Every non-trivial application has recurring tasks: database cleanup, report generation, cache warming, email digests, sitemap rebuilds. The standard approach is a system crontab, but on a PaaS, users do not have SSH access to edit crontabs. They need a managed cron service with a UI, execution history, and failure notifications.
The Data Model
Cron jobs lived alongside apps in the database. Each job belonged to an app and specified a cron expression, a command to execute, and operational parameters:
-- Migration 008: cron_runs table
CREATE TABLE cron_runs (
id TEXT PRIMARY KEY,
cron_job_id TEXT NOT NULL REFERENCES cron_jobs(id),
status TEXT NOT NULL DEFAULT 'pending',
started_at TEXT,
finished_at TEXT,
exit_code INTEGER,
stdout TEXT,
stderr TEXT,
error TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);The cron_jobs table (created in an earlier migration) stored the schedule definition. The cron_runs table stored execution history -- one row per invocation, with captured stdout, stderr, exit code, and timing. This separation meant that editing a cron schedule did not lose its execution history.
The Scheduler Loop
The CronScheduler was a background task that ticked every 60 seconds. On each tick, it loaded all enabled cron jobs, checked which ones were due based on their cron expression and last execution time, and spawned execution tasks for those that needed to run:
pub struct CronScheduler {
db: Arc<DbPool>,
docker: Arc<DockerClient>,
processing: DashMap<String, bool>,
}impl CronScheduler { pub async fn tick(&self) -> Result<()> { let jobs = CronJob::list_enabled(&self.db).await?;
for job in jobs { // Guard: skip if already processing if self.processing.contains_key(&job.id) { continue; }
if self.is_due(&job)? { self.processing.insert(job.id.clone(), true); let db = self.db.clone(); let docker = self.docker.clone(); let processing = self.processing.clone();
tokio::spawn(async move { let result = execute_job(&db, &docker, &job).await; processing.remove(&job.id); if let Err(e) = result { tracing::error!(job_id = %job.id, "Cron execution failed: {e}"); } }); } }
Ok(()) } } ```
The DashMap processing guard was essential. Without it, a job that took longer than 60 seconds to execute would be spawned again on the next tick, leading to overlapping executions. The guard ensured that each job had at most one active execution at any time.
Cron Expression Normalization
The cron crate in Rust expects 7-field cron expressions (seconds, minutes, hours, day-of-month, month, day-of-week, year), but users write 5-field expressions (minutes, hours, day-of-month, month, day-of-week) -- the standard format used by every crontab in existence.
We normalized user input by prepending 0 (seconds) and appending * (year):
fn normalize_cron(expr: &str) -> String {
let fields: Vec<&str> = expr.trim().split_whitespace().collect();
match fields.len() {
5 => format!("0 {} *", expr), // 5-field -> 7-field
6 => format!("0 {}", expr), // 6-field -> 7-field
7 => expr.to_string(), // Already 7-field
_ => expr.to_string(), // Let the parser catch the error
}
}This normalization meant that /5 (every 5 minutes) became 0 /5 *, which the cron parser accepted without complaint.
Timeout Enforcement
Long-running cron jobs are a common source of resource exhaustion. A backup script that hangs, a report generator that enters an infinite loop, a cleanup task that locks a table indefinitely -- any of these can consume resources that the application needs for serving traffic.
Each cron job had a configurable timeout. The execution function used tokio::time::timeout to enforce it:
async fn execute_job(
db: &DbPool,
docker: &DockerClient,
job: &CronJob,
) -> Result<()> {
let run = CronRun::create(db, &job.id).await?;
CronJob::update_run_status(db, &job.id, "running").await?;let timeout_duration = Duration::from_secs(job.timeout_seconds as u64);
let result = tokio::time::timeout( timeout_duration, docker.exec_in_container(&job.container_id, &job.command), ).await;
match result { Ok(Ok(output)) => { // Truncate stdout/stderr to 64KB to prevent DB bloat let stdout = truncate(&output.stdout, 64 * 1024); let stderr = truncate(&output.stderr, 64 * 1024); CronRun::complete(db, &run.id, output.exit_code, &stdout, &stderr).await?; } Ok(Err(e)) => { CronRun::fail(db, &run.id, &format!("Execution error: {e}")).await?; } Err(_) => { CronRun::fail(db, &run.id, "Timeout exceeded").await?; } }
// Prune old runs (keep last 100) CronRun::prune(db, &job.id, 100).await?;
Ok(()) } ```
The 64 KB truncation on stdout and stderr prevented a runaway log from bloating the database. The pruning step kept only the last 100 runs per job, ensuring that execution history was useful but bounded.
The Cron API and Dashboard
The cron system exposed a full CRUD API:
POST /api/v1/cron-jobs -- Create a cron job
GET /api/v1/cron-jobs -- List all cron jobs
GET /api/v1/apps/:id/cron-jobs -- List cron jobs for an app
GET /api/v1/cron-jobs/:id -- Get cron job details
PATCH /api/v1/cron-jobs/:id -- Update schedule/command/timeout
DELETE /api/v1/cron-jobs/:id -- Delete cron job
POST /api/v1/cron-jobs/:id/trigger -- Manually trigger execution
GET /api/v1/cron-jobs/:id/runs -- Get execution historyThe CLI mirrored the API:
# List cron jobs for an app
sh0 cron list --app my-app# Create a new cron job sh0 cron create --app my-app \ --schedule "0 2 *" \ --command "python manage.py cleanup" \ --timeout 300
# Trigger a manual run
sh0 cron trigger
# View recent runs
sh0 cron runs
The dashboard added a dedicated Cron Jobs page accessible from the sidebar. Each job was displayed as a card showing the schedule, command, last run status, and next scheduled execution. A "Run Now" button triggered immediate execution. Expanding a job revealed its execution history with timestamps, exit codes, and truncated stdout/stderr output.
Preview Environments
Preview environments solve a workflow problem that every team encounters: you want to test a pull request in a realistic environment before merging it, but spinning up a test server for every PR is expensive and manual.
sh0's preview environments worked through webhook integration. When a pull request was opened or updated on GitHub or GitLab, the webhook handler:
1. Detected the event type (pull_request.opened or pull_request.synchronize)
2. Created a new app with a unique subdomain: pr-{number}-{app-name}.{domain}
3. Cloned the PR branch and deployed it through the standard build pipeline
4. Configured Caddy routing for the preview subdomain
5. Posted a comment on the PR with the preview URL
When the PR was closed or merged, a pull_request.closed webhook triggered cleanup: the preview app was stopped, its containers removed, its volumes deleted, and its Caddy route removed.
The preview URL was deterministic. PR #42 for an app named frontend would always deploy to pr-42-frontend.sh0.dev. Opening the same PR again would update the existing preview rather than creating a duplicate.
Isolation Without Overhead
Preview environments reused the existing deployment pipeline entirely. They were regular sh0 apps with a few special properties:
- Their name was auto-generated from the PR number and parent app name
- Their environment variables were inherited from the parent app with overrides
- Their lifecycle was tied to the PR state (open = deployed, closed = destroyed)
- They were excluded from autoscaling and backup schedules
This meant that preview environments had full feature parity with production deployments: custom domains (if configured), environment variables, database connections, volume mounts, and health checks. The preview was not a simplified mock -- it was the real application running real code.
The Export System
Running in parallel with the cron scheduler, the export system was Phase 20. It generated deployment configurations for seven platforms, letting users who outgrew sh0 take their configuration with them:
- Docker Compose -- full
docker-compose.ymlwith services, volumes, and networks - Vercel --
vercel.jsonwith framework detection - AWS -- ECS task definition JSON
- GCP -- Cloud Run
service.yaml - Kubernetes -- Deployment + Service + Ingress manifests
- Railway --
railway.jsonconfiguration - Render --
render.yamlservice definition
Each generator read the app's configuration from the database -- image, environment variables, ports, volumes, resource limits -- and produced a platform-specific configuration file. Environment variable values were masked as ${VAR_NAME} placeholders so that exported files did not contain secrets.
The export API accepted a format parameter and returned the generated configuration:
# Export as Kubernetes manifests
sh0 export kubernetes --app my-app --output k8s-manifests.yaml# Export as Docker Compose sh0 export docker-compose --app my-app ```
The philosophical statement was intentional: sh0 would not hold your data hostage. If you wanted to leave, the export system gave you a head start on your destination platform.
Parallel Development and the Merge
Both phases were developed by parallel agent teams working in isolated git worktrees. The isolation was possible because the features had zero file overlap: different database tables, different API handler files, different CLI command files, different dashboard pages.
The merge into the main working directory required manual fixes for a few shared files:
types.rsneeded CronJob DTOs that one agent assumed the other would create- A reference pattern borrow issue in the cron handler needed a
.clone()fix - The router needed both sets of routes added to the same function
After the merge, 351 tests passed. The dashboard build succeeded. Both features worked independently and did not interfere with each other.
The parallel development pattern was becoming a reliable tool in our workflow. Features with zero file overlap could be developed simultaneously, cutting wall-clock time in half. The key was identifying the overlap boundary before starting -- and being disciplined about not crossing it.
---
Next in the series: Monitoring and Alerts: Email, Slack, Discord, Telegram, Webhooks -- how we built a monitoring system with periodic Docker stats collection, threshold-based alert evaluation, and multi-channel dispatch.