#188 -- GC, CLI, and HTTP Integration Testing

File management in a web application creates a problem that most frameworks ignore: orphaned files. A user uploads a profile photo, then changes it. The old photo is still on disk, consuming storage, referenced by nothing. A product listing is deleted, but its associated images remain. Over time, these orphaned files accumulate, consuming disk space and potentially exposing sensitive data that should have been deleted.

Session 237 completed FLIN's garbage collection system by integrating it with the CLI (for manual sweeps) and the HTTP server (for automatic reference tracking). This was the final piece of the FM-7 milestone -- compression and garbage collection -- bringing it to 100% completion across all 8 tasks.

The Blob Reference Problem

FLIN's file upload system stores files as blobs -- binary large objects -- in configurable storage backends (local filesystem, S3, R2, GCS). Each blob has a unique identifier (a hash of its content plus a timestamp). When a FLIN entity references a file, the entity's field stores the blob identifier.

The problem arises when references change. Consider this sequence:

User uploads photo.jpg -- stored as blob abc123, referenced by User#7.avatar.
User uploads new-photo.jpg -- stored as blob def456, User#7.avatar now references def456.
Blob abc123 is no longer referenced by any entity. It is orphaned.

Without garbage collection, blob abc123 lives on disk forever. With thousands of users updating their profiles, the storage cost of orphaned blobs grows continuously.

Reference Tracking

The garbage collection system maintains a reference index: a mapping from blob identifiers to the entities that reference them. When an entity is saved, its file fields are scanned and the references are recorded. When an entity is destroyed, its references are removed.

rustpub struct BlobRefIndex {
    refs: HashMap<String, Vec<BlobRef>>,  // blob_id -> list of references
    orphans: HashMap<String, Instant>,     // blob_id -> time became orphaned
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BlobRef {
    entity_type: String,
    entity_id: u64,
    field_name: String,
}

impl BlobRefIndex {
    pub fn add_ref(&mut self, blob_id: &str, entity_ref: BlobRef) {
        self.refs.entry(blob_id.to_string())
            .or_default()
            .push(entity_ref);

        // If this blob was previously orphaned, un-orphan it
        self.orphans.remove(blob_id);
    }

    pub fn remove_ref(&mut self, blob_id: &str, entity_type: &str, entity_id: u64) {
        if let Some(refs) = self.refs.get_mut(blob_id) {
            refs.retain(|r| !(r.entity_type == entity_type && r.entity_id == entity_id));

            if refs.is_empty() {
                self.refs.remove(blob_id);
                // Mark as orphaned with current timestamp
                self.orphans.insert(blob_id.to_string(), Instant::now());
            }
        }
    }

    pub fn find_orphans(&self, grace_period: Duration) -> Vec<&str> {
        self.orphans.iter()
            .filter(|(_, orphaned_at)| orphaned_at.elapsed() > grace_period)
            .map(|(blob_id, _)| blob_id.as_str())
            .collect()
    }
}

The reference index is persisted to disk as .flindb/blob_refs.json. It is loaded when the VM starts and saved on every entity save, destroy, and checkpoint operation. This ensures that reference tracking survives server restarts.

HTTP Integration: Automatic Tracking

The VM's save and destroy opcodes now automatically update the blob reference index. When an entity is saved, the VM scans its fields for file references and adds them to the index. When an entity is destroyed, the references are removed.

rust// In VM::execute_save()
fn execute_save(&mut self) -> Result<(), RuntimeError> {
    let entity = self.pop_entity()?;
    let id = self.storage.save(&entity)?;

    // Track blob references
    if let Some(ref mut blob_index) = self.blob_ref_index {
        let file_fields = self.schema
            .get_file_fields(&entity.entity_type);

        for field in file_fields {
            if let Some(Value::Text(blob_id)) = entity.get(&field.name) {
                blob_index.add_ref(blob_id, BlobRef {
                    entity_type: entity.entity_type.clone(),
                    entity_id: id,
                    field_name: field.name.clone(),
                });
            }
        }

        blob_index.save_to_disk()?;
    }

    self.push(Value::Int(id as i64));
    Ok(())
}

// In VM::execute_destroy()
fn execute_destroy(&mut self) -> Result<(), RuntimeError> {
    let entity = self.pop_entity()?;

    // Get file paths BEFORE destroying the entity
    let file_paths = self.storage
        .destroy_with_cleanup(&entity.entity_type, entity.id)?;

    // Remove blob references
    if let Some(ref mut blob_index) = self.blob_ref_index {
        for blob_id in &file_paths {
            blob_index.remove_ref(
                blob_id,
                &entity.entity_type,
                entity.id,
            );
        }

        blob_index.save_to_disk()?;
    }

    self.push(Value::Bool(true));
    Ok(())
}

The critical detail in execute_destroy is that file paths are extracted before the entity is deleted. If we extracted them after, the entity would already be gone and the file references would be unrecoverable. The destroy_with_cleanup method returns the list of blob identifiers that were referenced by the destroyed entity.

The Grace Period

Orphaned blobs are not deleted immediately. They enter a grace period (default: 1 hour) during which they can be re-referenced. This handles a common pattern: a user uploads a new avatar, the old avatar becomes orphaned, but then the user clicks "undo" and the old avatar is restored. Without the grace period, the old avatar would already be deleted.

The grace period also provides a safety net for transient failures. If the reference index fails to update during a save operation (disk full, for example), the blob is temporarily orphaned. The grace period gives the system time to recover and update the index correctly.

CLI Integration: The `flin gc` Command

Session 237 added the flin gc command for manual garbage collection. In production, operators need to inspect the state of blob storage, preview what would be deleted, and execute sweeps on their own schedule.

bash# Show GC status: total blobs, referenced, orphaned, reclaimable space
$ flin gc
Blob Storage Status:
  Total blobs:     1,247
  Referenced:      1,189
  Orphaned:        58
  Reclaimable:     234 MB
  Grace period:    1 hour
  Orphans past grace period: 41

# Preview what would be deleted (dry run)
$ flin gc --sweep --dry-run
Would delete 41 orphaned blobs (198 MB):
  abc123.jpg  (4.2 MB, orphaned 3h ago)
  def456.pdf  (12.1 MB, orphaned 2h ago)
  ghi789.png  (1.8 MB, orphaned 5h ago)
  ... (38 more)

# Execute the sweep
$ flin gc --sweep
Deleted 41 orphaned blobs, reclaimed 198 MB

# Custom grace period (5 minutes instead of 1 hour)
$ flin gc --sweep --grace-period 300

# Verbose output: show each blob being deleted
$ flin gc --sweep -v

The --dry-run flag is essential for production use. It lets operators verify what will be deleted before any data is permanently removed. The verbose flag (-v) shows each individual blob, useful for debugging when specific files should not be deleted.

The implementation in src/main.rs added the Gc variant to the Commands enum with the following options:

rustCommands::Gc {
    path: PathBuf,
    sweep: bool,
    dry_run: bool,
    grace_period: u64,
    verbose: bool,
}

The cmd_gc() handler function (approximately 150 lines) loads the blob reference index, scans the storage backend for all blobs, computes the orphan list, and either displays the status or executes the sweep.

The Format Bytes Helper

A small but important detail: the CLI output formats byte sizes in human-readable units. Nobody wants to see "198,234,567 bytes" -- they want to see "198 MB."

rustfn format_bytes(bytes: u64) -> String {
    const KB: u64 = 1024;
    const MB: u64 = KB * 1024;
    const GB: u64 = MB * 1024;

    if bytes >= GB {
        format!("{:.1} GB", bytes as f64 / GB as f64)
    } else if bytes >= MB {
        format!("{:.1} MB", bytes as f64 / MB as f64)
    } else if bytes >= KB {
        format!("{:.1} KB", bytes as f64 / KB as f64)
    } else {
        format!("{} B", bytes)
    }
}

Small utility functions like this are the difference between a developer tool and a production tool. Production tools are used by operators who need to make decisions quickly, and clear output formatting enables that.

Checkpoint Integration

The blob reference index is also saved during database checkpoints. This ensures consistency between the database state and the reference index:

rust// In VM::checkpoint()
fn checkpoint(&mut self) -> Result<(), RuntimeError> {
    self.storage.checkpoint()?;

    if let Some(ref blob_index) = self.blob_ref_index {
        blob_index.save_to_disk()?;
    }

    Ok(())
}

If the server crashes between a save and a checkpoint, the WAL replay on restart will re-execute the save operations, which will re-add the blob references. The reference index might temporarily be out of date, but the grace period ensures no blobs are prematurely deleted.

Test Results

Session 237 added 10 new tests across two files:

CLI tests (6): - Parsing of flin gc command with default options - Parsing with --sweep flag - Parsing with --dry-run flag - Parsing with --grace-period custom value - Parsing with -v verbose flag - Parsing with all options combined

VM integration tests (4): - VM with storage has a blob reference index - VM without storage (in-memory) has no blob reference index - Blob reference index persists across VM restarts - Checkpoint saves the blob reference index

The total test count after Session 237 reached 3,537: 2,920 library tests and 617 integration tests. All passing.

FM-7 Milestone Complete

Session 237 completed the FM-7 milestone -- Compression and Garbage Collection -- at 100%:

Task	Description	Status
FM7-01	Zstd compression	Complete
FM7-02	Compression CLI	Complete
FM7-03	Compression statistics	Complete
FM7-04	Decompression	Complete
FM7-05	GC infrastructure	Complete
FM7-06	GC orphan detection	Complete
FM7-07	GC CLI integration	Complete
FM7-08	HTTP GC integration	Complete

Eight tasks, eight completions. The file management system -- from upload to storage to compression to garbage collection -- was fully operational. A FLIN application could accept file uploads, store them in any of four backends, compress them for efficient storage, track references automatically, and clean up orphaned files either manually via the CLI or automatically via the grace period mechanism.

This is the kind of infrastructure that most web frameworks leave to the developer. In FLIN, it is built in.

This is Part 188 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [187] Search Result Caching - [188] GC, CLI, and HTTP Integration Testing (you are here) - [189] Tracking Sync and State Management

#188 -- GC, CLI, and HTTP Integration Testing

The Blob Reference Problem

Reference Tracking

HTTP Integration: Automatic Tracking

The Grace Period

CLI Integration: The `flin gc` Command

The Format Bytes Helper

Checkpoint Integration

Test Results

FM-7 Milestone Complete

Responses

Related Articles

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

The gate caught its own drift: one day inside CASP with Claude Fable 5

The CASP Transplant: How The Six-File Discipline Moved From Conductor To An Anti-Fraud Transport ERP, What The /next Skill Adds When The Operator Just Types 'next', And Why The Cost Of CASP Drift Rises When The Project Is Someone Else's Cash

The Blob Reference Problem

Reference Tracking

HTTP Integration: Automatic Tracking

The Grace Period

CLI Integration: The flin gc Command

The Format Bytes Helper

Checkpoint Integration

Test Results

FM-7 Milestone Complete

Responses

Related Articles

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

The gate caught its own drift: one day inside CASP with Claude Fable 5

The CASP Transplant: How The Six-File Discipline Moved From Conductor To An Anti-Fraud Transport ERP, What The /next Skill Adds When The Operator Just Types 'next', And Why The Cost Of CASP Drift Rises When The Project Is Someone Else's Cash

CLI Integration: The `flin gc` Command