#127 -- The Storage Backend Trait Pattern

Abstractions are easy to design badly. You either make them too thin -- exposing backend-specific details that leak through every call site -- or too thick, hiding capabilities that downstream code legitimately needs. The StorageBackend trait in FLIN needed to thread this needle precisely: abstract enough that local filesystems, S3, Cloudflare R2, and Google Cloud Storage all fit behind the same interface, yet concrete enough that each backend can optimize its operations without contortions.

This article dissects the trait design, the reasoning behind each method signature, and the Rust-specific patterns that keep the abstraction safe across concurrent requests.

The Trait Definition

The complete trait has nine methods. Each one was chosen because all four backends need it, and none could be expressed purely in terms of the others:

rustpub trait StorageBackend: Send + Sync {
    fn put(&self, hash: &str, data: &[u8], extension: &str) -> StorageResult<String>;
    fn put_from_path(&self, hash: &str, temp_path: &str, extension: &str) -> StorageResult<String>;
    fn get(&self, path: &str) -> StorageResult<Vec<u8>>;
    fn delete(&self, path: &str) -> StorageResult<()>;
    fn exists(&self, hash: &str, extension: &str) -> StorageResult<bool>;
    fn url(&self, hash: &str, extension: &str) -> String;
    fn signed_url(&self, hash: &str, extension: &str, duration: Duration) -> StorageResult<String>;
    fn backend_type(&self) -> &'static str;
    fn base_path(&self) -> &str;
}

This looks simple. It is not. Every method signature represents a design decision that took several iterations to get right.

Why Send + Sync

The most important two words in the trait definition are not method names -- they are Send + Sync. These Rust marker traits guarantee that any type implementing StorageBackend can be safely shared between threads and sent across thread boundaries.

FLIN's HTTP server handles requests concurrently. When two users upload files simultaneously, the storage backend receives two put calls from different threads. Without Send + Sync, the Rust compiler would refuse to share the backend across threads, forcing you into either single-threaded file handling (a performance disaster) or unsafe code (a correctness disaster).

rust// This works because StorageBackend: Send + Sync
pub struct HttpServer {
    storage: Arc<dyn StorageBackend>,
    // ... other fields
}

// Multiple request handlers can access storage concurrently
async fn handle_upload(server: &HttpServer, request: Request) -> Response {
    let path = server.storage.put(&hash, &data, &ext)?;
    // ...
}

The Arc<dyn StorageBackend> pattern -- an atomically reference-counted trait object -- is how FLIN shares a single backend instance across all request handlers. The Send + Sync bound on the trait makes this pattern possible at compile time. If a backend implementation were not thread-safe, the code would not compile. There is no runtime check, no mutex you might forget, no data race lurking in production.

Method-by-Method Design

put and put_from_path

Two put methods exist because files arrive in two forms. Small files (under 50 MB) are held in memory as byte slices after multipart parsing. Large files are streamed to a temporary file on disk to avoid memory exhaustion. The backend needs to handle both:

rust// Small file: data is in memory
fn put(&self, hash: &str, data: &[u8], extension: &str) -> StorageResult<String>;

// Large file: data is in a temp file
fn put_from_path(&self, hash: &str, temp_path: &str, extension: &str) -> StorageResult<String>;

For the local backend, put writes bytes directly to the content-addressable path. put_from_path moves (or copies) the temp file to its final location -- a filesystem rename when possible, which is nearly instantaneous regardless of file size.

For cloud backends, both methods end up uploading bytes over HTTP. The difference is that put_from_path reads the temp file first. This distinction matters for the local backend's performance but is largely academic for cloud backends. We kept both methods in the trait because the local backend is the most commonly used during development, and fast development cycles matter.

exists Before put

Every backend checks whether a blob already exists before writing it. This is the deduplication check:

rustfn exists(&self, hash: &str, extension: &str) -> StorageResult<bool>;

For the local backend, this is a filesystem Path::exists() call. For cloud backends, it is a HEAD request -- lightweight, no data transfer, just metadata. If the blob exists, the put call returns immediately without uploading anything.

This design means that uploading the same file twice is nearly free. The first upload computes the hash, checks existence (miss), and writes the data. The second upload computes the hash, checks existence (hit), and returns. In a system where users might upload the same PDF multiple times or where the same avatar is used across accounts, this saves significant bandwidth and storage.

url and signed_url

Two separate URL methods exist because they serve fundamentally different security models:

rust// Public URL -- anyone with the link can access the file
fn url(&self, hash: &str, extension: &str) -> String;

// Signed URL -- time-limited, cryptographically verified
fn signed_url(&self, hash: &str, extension: &str, duration: Duration) -> StorageResult<String>;

The public URL is deterministic and permanent. For the local backend, it looks like /files/{shard}/{hash}/data.{ext}. For R2, it looks like https://{bucket}.{account_id}.r2.cloudflarestorage.com/{key}. For GCS, it is https://storage.googleapis.com/{bucket}/{key}.

The signed URL includes cryptographic proof that the server authorized the access. Each backend implements signing differently:

Backend	Signing Method	URL Lifetime
Local	HMAC-SHA256 with server secret	Configurable
S3	AWS Signature V4 presigning	Configurable
R2	S3-compatible presigning	Configurable
GCS	V4 RSA-SHA256 with service account key	Configurable

The signed_url method returns a StorageResult (not a plain String) because signing can fail. The local backend needs a configured secret. GCS needs a valid private key. S3 needs valid credentials. If any of these are missing or malformed, the method returns an error rather than an invalid URL.

backend_type and base_path

These two methods exist primarily for logging and debugging:

rustfn backend_type(&self) -> &'static str;  // "local", "s3", "r2", "gcs"
fn base_path(&self) -> &str;             // directory or prefix

When a file operation fails, the error message includes the backend type and base path so the developer knows which storage system failed and where it was trying to write. Without these methods, a generic "file write failed" error would be nearly impossible to diagnose in a production system with multiple storage configurations.

The StorageConfig Enum

Backend creation is driven by a configuration enum that maps to flin.config settings:

rustpub enum StorageConfig {
    Local {
        directory: String,
        secret: String,
    },
    S3 {
        bucket: String,
        region: String,
        access_key: String,
        secret_key: String,
        prefix: String,
    },
    R2 {
        bucket: String,
        account_id: String,
        access_key: String,
        secret_key: String,
        prefix: String,
    },
    Gcs {
        bucket: String,
        credentials_path: String,
        prefix: String,
    },
}

Each variant carries exactly the fields that backend needs -- no more, no less. The local backend needs a directory and a signing secret. S3 needs bucket, region, and credentials. GCS needs a path to a service account JSON file. This enum enforces at the type level that you cannot create a GCS backend without credentials or an S3 backend without a region.

The Factory Pattern

The create_backend function is the only place in the codebase that knows about specific backend types:

rustpub fn create_backend(config: StorageConfig) -> Result<Box<dyn StorageBackend>, StorageError> {
    match config {
        StorageConfig::Local { directory, secret } => {
            let backend = LocalBackend::new(&directory, &secret)?;
            Ok(Box::new(backend))
        }
        StorageConfig::R2 { bucket, account_id, access_key, secret_key, prefix } => {
            let backend = R2Backend::new(&bucket, &account_id, &access_key, &secret_key, &prefix)?;
            Ok(Box::new(backend))
        }
        StorageConfig::Gcs { bucket, credentials_path, prefix } => {
            let backend = GcsBackend::new(&bucket, &credentials_path, &prefix)?;
            Ok(Box::new(backend))
        }
        StorageConfig::S3 { bucket, region, access_key, secret_key, prefix } => {
            let backend = S3Backend::new(&bucket, &region, &access_key, &secret_key, &prefix)?;
            Ok(Box::new(backend))
        }
    }
}

Every other part of FLIN -- the HTTP server, the VM, the garbage collector, the preview generator -- works with Box<dyn StorageBackend> or Arc<dyn StorageBackend>. They never import LocalBackend or R2Backend directly. This is not just good practice; it is enforced by the module structure. The concrete backend types are public for testing but are never referenced outside the storage module in production code.

Security Invariants

The trait design encodes several security invariants that every backend must uphold:

Hash validation. The hash parameter in put, exists, url, and signed_url must be a valid hexadecimal string. Every backend validates this before constructing a path, preventing path traversal attacks where a malicious hash like ../../etc/passwd could escape the storage directory.

Constant-time comparison. Signed URL verification uses constant-time byte comparison to prevent timing attacks. An attacker who can measure response time differences could otherwise brute-force valid signatures one character at a time.

Extension sanitization. File extensions are stripped of path separators and validated against a character allowlist. An extension like .jpg/../../../etc/passwd is rejected before it ever reaches the filesystem.

rustfn validate_hash(hash: &str) -> StorageResult<()> {
    if hash.is_empty() || hash.len() > 128 {
        return Err(StorageError::InvalidHash("invalid length".into()));
    }
    if !hash.chars().all(|c| c.is_ascii_hexdigit()) {
        return Err(StorageError::InvalidHash("non-hex characters".into()));
    }
    Ok(())
}

These validations happen in every backend implementation, not in a shared base. This is intentional -- each backend might have additional constraints (GCS key length limits, S3 key character restrictions), and centralizing validation would risk missing backend-specific rules.

What the Trait Does Not Include

Equally important is what the trait does not include. There are no methods for listing all files, streaming large downloads, or managing access control lists. These are handled by separate systems (the grant manager for access control, the garbage collector for listing blobs) that compose with the storage backend rather than extending it.

This keeps the trait focused. Nine methods, each with a clear purpose, each implementable by any storage system that can store and retrieve bytes by key. The simplicity of the interface is what makes it possible to add a fifth backend in the future -- Azure Blob Storage, for instance -- without touching any code outside the storage module.

The trait pattern is one of Rust's most powerful abstractions, and FLIN's storage system demonstrates why. It enforces thread safety at compile time, enables polymorphic dispatch through trait objects, and keeps the interface stable while allowing each backend to optimize its internals independently. In the next article, we examine the two cloud backends in detail: how R2 leverages S3 compatibility and how GCS implements V4 signed URLs from scratch.

This is Part 127 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [126] File Storage With 4 Backends - [127] The Storage Backend Trait Pattern (you are here) - [128] R2 and Google Cloud Storage Backends