A database without transactions is a database that will lose data. Not might. Will. The moment two operations need to succeed or fail together -- transferring money between accounts, creating an order with its line items, registering a user and their profile -- you need atomicity. Without it, a crash between the two operations leaves the database in an impossible state: money debited but not credited, an order without items, a user without a profile.
A database without backups is a database waiting to be the subject of a post-mortem. Hardware fails. Humans make mistakes. Software has bugs. The question is not whether you will need to restore data, but when.
Session 166 was a marathon. Four major feature areas in a single session: ACID transactions, backup and restore, graph queries, and semantic search. Ninety-four tests added. This article covers the first two -- transactions and backup -- because they are the foundation that makes FlinDB production-ready.
ACID Transactions
FlinDB's transaction system provides the four ACID guarantees:
- Atomicity: All changes in a transaction succeed or none do
- Consistency: The database moves from one valid state to another
- Isolation: Concurrent transactions do not interfere
- Durability: Committed changes survive crashes
The Transaction Struct
Every transaction is a self-contained unit of work:
pub struct Transaction {
id: TransactionId,
started_at: i64,
timeout_ms: Option<u64>,
state: TransactionState,
savepoints: Vec<Savepoint>,
pending_saves: Vec<PendingSave>,
pending_deletes: Vec<PendingDelete>,
read_versions: HashMap<(String, u64), u64>,
}The pending_saves and pending_deletes fields are the key to atomicity. During a transaction, no changes are applied to the main data store. Instead, they are accumulated in these pending lists. Only when commit() is called are all changes applied at once. If rollback() is called, the pending lists are discarded and the database remains unchanged.
The read_versions field enables optimistic concurrency control. When a transaction reads an entity, it records the entity's version number. At commit time, ZeroCore checks whether any of these versions have changed. If another transaction modified an entity that this transaction read, the commit fails with a conflict error -- preventing lost updates.
Begin, Commit, Rollback
The transaction lifecycle is straightforward:
db.begin_transaction()order = db.save("Order", { total: 0 }) item1 = db.save("OrderItem", { order_id: order.id, product: "Laptop" }) item2 = db.save("OrderItem", { order_id: order.id, product: "Mouse" })
db.commit() ```
If any save fails, the entire transaction can be rolled back:
db.begin_transaction()try { db.save("Transfer", { from: account_a, to: account_b, amount: 1000 }) db.save("Account", account_a.id, { balance: account_a.balance - 1000 }) db.save("Account", account_b.id, { balance: account_b.balance + 1000 }) db.commit() } catch (e) { db.rollback() } ```
The Rust implementation of commit applies all pending operations atomically:
pub fn commit_transaction(
&mut self,
txn_id: TransactionId,
) -> DatabaseResult<TransactionCommitResult> {
let txn = self.transactions.remove(&txn_id)
.ok_or(DatabaseError::TransactionNotFound)?;// Check for optimistic locking conflicts for ((entity_type, entity_id), read_version) in &txn.read_versions { let current_version = self.get_current_version(entity_type, *entity_id)?; if current_version != *read_version { return Err(DatabaseError::OptimisticLockConflict { entity_type: entity_type.clone(), entity_id: *entity_id, }); } }
// Apply all pending saves for save in txn.pending_saves { self.save(&save.entity_type, save.id, save.fields)?; }
// Apply all pending deletes for delete in txn.pending_deletes { self.delete(&delete.entity_type, delete.id)?; }
Ok(TransactionCommitResult { saves: txn.pending_saves.len(), deletes: txn.pending_deletes.len(), }) } ```
Savepoints
Savepoints allow partial rollback within a transaction. This is essential for complex workflows where you want to undo the last step without losing everything:
db.begin_transaction()
order = db.save("Order", { total: 0 })
db.create_savepoint("after_order")try { db.save("OrderItem", { order_id: order.id, product: "Laptop" }) db.commit() } catch (e) { db.rollback_to_savepoint("after_order") db.save("Order", order.id, { status: "failed" }) db.commit() } ```
The rollback_to_savepoint() method discards all pending operations added after the savepoint was created, while keeping operations from before the savepoint.
Transaction Timeouts
Long-running transactions are dangerous. They hold resources, block other operations, and often indicate a programming error (a transaction that was begun but never committed). FlinDB supports configurable timeouts:
let txn = db.begin_transaction_with_timeout(5000); // 5 secondsIf the transaction is not committed or rolled back within the timeout period, it is automatically rolled back. This prevents resource leaks from forgotten transactions.
Backup and Restore
With transactions providing atomicity, the backup system ensures durability beyond a single process lifetime. FlinDB supports three backup strategies: full, incremental, and continuous.
Full Backup
A full backup captures the complete database state -- all schemas, all entities, all version history:
let options = BackupOptions::default();
Backup::full(&db, "backup.flindb.bak", options)?;The backup file format uses Zstd compression:
.flindb.bak (Zstd compressed JSON)
+-- header: magic, version, type, timestamp, checksum
+-- metadata: schema count, entity count
+-- schemas: serialized EntitySchema[]
+-- entities: by type with full historyWhy Zstd? We benchmarked three compression algorithms:
| Algorithm | Compression Ratio | Speed |
|---|---|---|
| Zstd (level 3) | 11% smaller than gzip | 42% faster than Brotli |
| Gzip | Baseline | Baseline |
| Brotli | 8% smaller than Zstd | 42% slower than Zstd |
Zstd offered the best trade-off: nearly the best compression ratio with significantly faster compression and decompression. For a backup system where both creation speed and restore speed matter, Zstd was the clear winner.
Every backup includes a SHA-256 checksum of the data payload. On restore, the checksum is verified before any data is applied. A corrupted backup file is rejected rather than silently loading garbled data.
Incremental Backup
Incremental backups capture only the changes since the last backup, using the WAL as the source of deltas:
let options = BackupOptions::incremental(last_backup_version);
Backup::incremental(&db, "backup_incr.flindb.bak", options)?;Incremental backups are smaller and faster than full backups, making them suitable for frequent backup intervals. To restore, you apply the last full backup followed by all subsequent incremental backups in order.
Point-in-Time Recovery
FlinDB supports restoring to a specific timestamp:
let options = RestoreOptions::point_in_time(target_timestamp);
let db = Backup::restore("backup.flindb.bak", options)?;Point-in-time recovery replays entity versions up to the specified timestamp, effectively rewinding the database to a past state. This is possible because FlinDB's temporal model preserves all versions -- the backup contains the complete history, and the restore process can stop at any point in that history.
Continuous Backup
Session 170 extended the backup system with continuous WAL streaming. Instead of periodic snapshots, the ContinuousBackup struct streams every WAL entry to a backup destination in real-time:
pub struct ContinuousBackup {
source_wal: PathBuf,
destination: BackupDestination,
last_position: Arc<AtomicU64>,
running: Arc<AtomicBool>,
poll_interval: Duration,
}The backup destination can be local or S3-compatible:
pub enum BackupDestination {
Local(PathBuf),
S3 {
bucket: String,
region: String,
endpoint: Option<String>,
prefix: String,
access_key: String,
secret_key: String,
},
}Continuous backup runs in a background thread, polling the WAL for new entries and streaming them to the destination:
let backup = ContinuousBackup::new(wal_path, BackupDestination::local(dest_path))
.with_poll_interval(Duration::from_millis(50))
.with_start_position(1000); // Resume from positionlet handle = backup.start(); // Application runs... backup.stop(); handle.join().unwrap(); ```
The with_start_position() method enables resume capability. If the backup process is restarted, it picks up from where it left off rather than re-streaming the entire WAL. This is critical for production use where backup processes may be restarted during deployments.
For S3 destinations, entries are batched into 1 MB chunks before upload to minimize the number of S3 API calls and associated costs.
Backup Scheduling
The BackupScheduler automates periodic backups with retention policies:
let scheduler = BackupScheduler::new(
Duration::from_secs(3600), // Every hour
24, // Keep 24 backups
"./backups",
)
.with_backup_type(BackupType::Full)
.with_compression(true);let handle = scheduler.start(Arc::new(Mutex::new(db))); ```
The scheduler runs in a background thread, creating backups at the configured interval and enforcing the retention policy by deleting the oldest backups when the count exceeds the limit.
In FLIN configuration syntax, the backup setup is declarative:
app {
backup: {
enabled: true
continuous: {
destination: "local"
path: "./backups/"
}
schedule: {
interval: "1h"
retention: 24
type: "full"
compression: true
}
}
}The Test Suite
Transactions and backup together account for 55 tests across Sessions 166 and 170:
Transaction tests (12): - Begin/commit/rollback lifecycle - Savepoint creation and partial rollback - Transaction timeout enforcement - Optimistic locking conflict detection - Commit result details
Backup tests (21 from Session 166): - Full backup creation and verification - Incremental backup creation - Zstd compression roundtrip - SHA-256 checksum verification - Point-in-time recovery - Restore configuration options
Continuous backup and scheduling tests (22 from Session 170): - BackupDestination validation (local and S3) - ContinuousBackup streaming and position tracking - BackupScheduler creation, retention, and cleanup - Resume capability after restart
Total tests after Session 170: 2,365 (1,748 library + 617 integration).
Why Both Transactions and Backups Matter
Transactions protect against application-level failures -- a crash during a multi-step operation. Backups protect against infrastructure-level failures -- disk corruption, accidental deletion, hardware failure.
Without transactions, a power loss during an order creation could leave an order without items. Without backups, a disk failure could lose all data permanently. Together, they form a complete durability story: transactions ensure consistency within a running system, and backups ensure recoverability when the system itself fails.
FlinDB provides both, with zero configuration required. Transactions are always available. The WAL provides crash recovery by default. And with a few lines of configuration, continuous backup and scheduled rotation ensure that data survives any failure.
---
This is Part 8 of the "How We Built FlinDB" series, documenting how we built a complete embedded database engine for the FLIN programming language.
Series Navigation: - [061] Index Utilization: Making Queries Fast - [062] Relationships and Eager/Lazy Loading - [063] Transactions and Continuous Backup (you are here) - [064] Graph Queries and Semantic Search - [065] The EAVT Storage Model