Cron Jobs and Preview Environments: Two Features, Zero Downtime

Some features are best built alone. You focus on one problem, reason about its edge cases, write the code, write the tests, and ship it. Other features are best built in parallel -- when they touch entirely different domains and share zero files, running them concurrently cuts the calendar time in half without introducing merge conflicts.

Phases 20 and 21 of sh0's development were a parallel build. The export system generated deployment configurations for seven platforms (Vercel, AWS, GCP, Kubernetes, Railway, Render, Docker Compose). The cron scheduler executed recurring jobs inside application containers with timeout enforcement. They shared no code, no database tables, no API routes. We ran them as parallel agent teams in isolated git worktrees.

This article focuses on the cron scheduler and preview environments -- two features that gave sh0 users the tools they needed to automate recurring tasks and test pull requests before merging.

The Cron Scheduler

Every non-trivial application has recurring tasks: database cleanup, report generation, cache warming, email digests, sitemap rebuilds. The standard approach is a system crontab, but on a PaaS, users do not have SSH access to edit crontabs. They need a managed cron service with a UI, execution history, and failure notifications.

The Data Model

Cron jobs lived alongside apps in the database. Each job belonged to an app and specified a cron expression, a command to execute, and operational parameters:

sql-- Migration 008: cron_runs table
CREATE TABLE cron_runs (
    id TEXT PRIMARY KEY,
    cron_job_id TEXT NOT NULL REFERENCES cron_jobs(id),
    status TEXT NOT NULL DEFAULT 'pending',
    started_at TEXT,
    finished_at TEXT,
    exit_code INTEGER,
    stdout TEXT,
    stderr TEXT,
    error TEXT,
    created_at TEXT NOT NULL DEFAULT (datetime('now'))
);

The cron_jobs table (created in an earlier migration) stored the schedule definition. The cron_runs table stored execution history -- one row per invocation, with captured stdout, stderr, exit code, and timing. This separation meant that editing a cron schedule did not lose its execution history.

The Scheduler Loop

The CronScheduler was a background task that ticked every 60 seconds. On each tick, it loaded all enabled cron jobs, checked which ones were due based on their cron expression and last execution time, and spawned execution tasks for those that needed to run:

rustpub struct CronScheduler {
    db: Arc<DbPool>,
    docker: Arc<DockerClient>,
    processing: DashMap<String, bool>,
}

impl CronScheduler {
    pub async fn tick(&self) -> Result<()> {
        let jobs = CronJob::list_enabled(&self.db).await?;

        for job in jobs {
            // Guard: skip if already processing
            if self.processing.contains_key(&job.id) {
                continue;
            }

            if self.is_due(&job)? {
                self.processing.insert(job.id.clone(), true);
                let db = self.db.clone();
                let docker = self.docker.clone();
                let processing = self.processing.clone();

                tokio::spawn(async move {
                    let result = execute_job(&db, &docker, &job).await;
                    processing.remove(&job.id);
                    if let Err(e) = result {
                        tracing::error!(job_id = %job.id, "Cron execution failed: {e}");
                    }
                });
            }
        }

        Ok(())
    }
}

The DashMap processing guard was essential. Without it, a job that took longer than 60 seconds to execute would be spawned again on the next tick, leading to overlapping executions. The guard ensured that each job had at most one active execution at any time.

Cron Expression Normalization

The cron crate in Rust expects 7-field cron expressions (seconds, minutes, hours, day-of-month, month, day-of-week, year), but users write 5-field expressions (minutes, hours, day-of-month, month, day-of-week) -- the standard format used by every crontab in existence.

We normalized user input by prepending 0 (seconds) and appending * (year):

rustfn normalize_cron(expr: &str) -> String {
    let fields: Vec<&str> = expr.trim().split_whitespace().collect();
    match fields.len() {
        5 => format!("0 {} *", expr),  // 5-field -> 7-field
        6 => format!("0 {}", expr),     // 6-field -> 7-field
        7 => expr.to_string(),          // Already 7-field
        _ => expr.to_string(),          // Let the parser catch the error
    }
}

This normalization meant that /5  (every 5 minutes) became 0 /5 *, which the cron parser accepted without complaint.

Timeout Enforcement

Long-running cron jobs are a common source of resource exhaustion. A backup script that hangs, a report generator that enters an infinite loop, a cleanup task that locks a table indefinitely -- any of these can consume resources that the application needs for serving traffic.

Each cron job had a configurable timeout. The execution function used tokio::time::timeout to enforce it:

rustasync fn execute_job(
    db: &DbPool,
    docker: &DockerClient,
    job: &CronJob,
) -> Result<()> {
    let run = CronRun::create(db, &job.id).await?;
    CronJob::update_run_status(db, &job.id, "running").await?;

    let timeout_duration = Duration::from_secs(job.timeout_seconds as u64);

    let result = tokio::time::timeout(
        timeout_duration,
        docker.exec_in_container(&job.container_id, &job.command),
    ).await;

    match result {
        Ok(Ok(output)) => {
            // Truncate stdout/stderr to 64KB to prevent DB bloat
            let stdout = truncate(&output.stdout, 64 * 1024);
            let stderr = truncate(&output.stderr, 64 * 1024);
            CronRun::complete(db, &run.id, output.exit_code, &stdout, &stderr).await?;
        }
        Ok(Err(e)) => {
            CronRun::fail(db, &run.id, &format!("Execution error: {e}")).await?;
        }
        Err(_) => {
            CronRun::fail(db, &run.id, "Timeout exceeded").await?;
        }
    }

    // Prune old runs (keep last 100)
    CronRun::prune(db, &job.id, 100).await?;

    Ok(())
}

The 64 KB truncation on stdout and stderr prevented a runaway log from bloating the database. The pruning step kept only the last 100 runs per job, ensuring that execution history was useful but bounded.

The Cron API and Dashboard

The cron system exposed a full CRUD API:

POST   /api/v1/cron-jobs              -- Create a cron job
GET    /api/v1/cron-jobs              -- List all cron jobs
GET    /api/v1/apps/:id/cron-jobs     -- List cron jobs for an app
GET    /api/v1/cron-jobs/:id          -- Get cron job details
PATCH  /api/v1/cron-jobs/:id          -- Update schedule/command/timeout
DELETE /api/v1/cron-jobs/:id          -- Delete cron job
POST   /api/v1/cron-jobs/:id/trigger  -- Manually trigger execution
GET    /api/v1/cron-jobs/:id/runs     -- Get execution history

The CLI mirrored the API:

bash# List cron jobs for an app
sh0 cron list --app my-app

# Create a new cron job
sh0 cron create --app my-app \
  --schedule "0 2 * * *" \
  --command "python manage.py cleanup" \
  --timeout 300

# Trigger a manual run
sh0 cron trigger <job-id>

# View recent runs
sh0 cron runs <job-id>

The dashboard added a dedicated Cron Jobs page accessible from the sidebar. Each job was displayed as a card showing the schedule, command, last run status, and next scheduled execution. A "Run Now" button triggered immediate execution. Expanding a job revealed its execution history with timestamps, exit codes, and truncated stdout/stderr output.

Preview Environments

Preview environments solve a workflow problem that every team encounters: you want to test a pull request in a realistic environment before merging it, but spinning up a test server for every PR is expensive and manual.

sh0's preview environments worked through webhook integration. When a pull request was opened or updated on GitHub or GitLab, the webhook handler:

Detected the event type (pull_request.opened or pull_request.synchronize)
Created a new app with a unique subdomain: pr-{number}-{app-name}.{domain}
Cloned the PR branch and deployed it through the standard build pipeline
Configured Caddy routing for the preview subdomain
Posted a comment on the PR with the preview URL

When the PR was closed or merged, a pull_request.closed webhook triggered cleanup: the preview app was stopped, its containers removed, its volumes deleted, and its Caddy route removed.

The preview URL was deterministic. PR #42 for an app named frontend would always deploy to pr-42-frontend.sh0.dev. Opening the same PR again would update the existing preview rather than creating a duplicate.

Isolation Without Overhead

Preview environments reused the existing deployment pipeline entirely. They were regular sh0 apps with a few special properties:

Their name was auto-generated from the PR number and parent app name
Their environment variables were inherited from the parent app with overrides
Their lifecycle was tied to the PR state (open = deployed, closed = destroyed)
They were excluded from autoscaling and backup schedules

This meant that preview environments had full feature parity with production deployments: custom domains (if configured), environment variables, database connections, volume mounts, and health checks. The preview was not a simplified mock -- it was the real application running real code.

The Export System

Running in parallel with the cron scheduler, the export system was Phase 20. It generated deployment configurations for seven platforms, letting users who outgrew sh0 take their configuration with them:

Docker Compose -- full docker-compose.yml with services, volumes, and networks
Vercel -- vercel.json with framework detection
AWS -- ECS task definition JSON
GCP -- Cloud Run service.yaml
Kubernetes -- Deployment + Service + Ingress manifests
Railway -- railway.json configuration
Render -- render.yaml service definition

Each generator read the app's configuration from the database -- image, environment variables, ports, volumes, resource limits -- and produced a platform-specific configuration file. Environment variable values were masked as ${VAR_NAME} placeholders so that exported files did not contain secrets.

The export API accepted a format parameter and returned the generated configuration:

bash# Export as Kubernetes manifests
sh0 export kubernetes --app my-app --output k8s-manifests.yaml

# Export as Docker Compose
sh0 export docker-compose --app my-app

The philosophical statement was intentional: sh0 would not hold your data hostage. If you wanted to leave, the export system gave you a head start on your destination platform.

Parallel Development and the Merge

Both phases were developed by parallel agent teams working in isolated git worktrees. The isolation was possible because the features had zero file overlap: different database tables, different API handler files, different CLI command files, different dashboard pages.

The merge into the main working directory required manual fixes for a few shared files:

types.rs needed CronJob DTOs that one agent assumed the other would create
A reference pattern borrow issue in the cron handler needed a .clone() fix
The router needed both sets of routes added to the same function

After the merge, 351 tests passed. The dashboard build succeeded. Both features worked independently and did not interfere with each other.

The parallel development pattern was becoming a reliable tool in our workflow. Features with zero file overlap could be developed simultaneously, cutting wall-clock time in half. The key was identifying the overlap boundary before starting -- and being disciplined about not crossing it.

Next in the series: Monitoring and Alerts: Email, Slack, Discord, Telegram, Webhooks -- how we built a monitoring system with periodic Docker stats collection, threshold-based alert evaluation, and multi-channel dispatch.

Cron Jobs and Preview Environments: Two Features, Zero Downtime

The Cron Scheduler

The Data Model

The Scheduler Loop

Cron Expression Normalization

Timeout Enforcement

The Cron API and Dashboard

Preview Environments

Isolation Without Overhead

The Export System

Parallel Development and the Merge

Responses

Related Articles

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

The gate caught its own drift: one day inside CASP with Claude Fable 5

The CASP Transplant: How The Six-File Discipline Moved From Conductor To An Anti-Fraud Transport ERP, What The /next Skill Adds When The Operator Just Types 'next', And Why The Cost Of CASP Drift Rises When The Project Is Someone Else's Cash