How it works

You've got a SQLite database, and you want a continuous, point-in-time copy of it somewhere else. This page explains how bakelite does that. If you just want it running, head to Install & deploy instead.

Watching the file for changes

bakelite takes advantage of the fact that a SQLite database is just a couple of files: it watches the -wal file through the OS (inotify on Linux, FSEvents on macOS, kqueue on the BSDs) and only wakes when something actually changes. On an idle database the process just blocks on the watcher until the next write arrives.

A missed notification can't strand you, though. There's a cheap fallback poll (safety_poll, 30s by default — not a busy loop) as a backstop, so even if the OS drops an event a sync still happens at least that often. See When things go wrong for the full failure picture.

It reads SQLite's files directly

bakelite reads WAL frames and database pages straight from the filesystem; it uses the SQL engine only to control the database (set pragmas, hold locks, drive checkpoints).

WAL frames and database pages are parsed byte-for-byte from the files (the wal and dbfile modules). This includes SQLite's custom rolling WAL checksum, which is not a CRC.
The SQL library (rusqlite) is reached only through a deliberately tiny control seam, db::ControlDb — the single place the engine is touched, kept thin so the SQLite dependency stays isolated and easy to reason about.

Backups and change-sets

A replica is one full backup (a snapshot) plus the page-changes that came after it. Restore replays them in order, latest change wins.

Everything else is detail. The change-sets (segments) are page images, indexed monotonically across the replica's whole lifetime — one global index space, with a re-snapshot being just another full backup at the next index — and:

Snapshots are a db-file + WAL overlay (latest committed page wins), which stays consistent even while a checkpoint runs and doesn't depend on a checkpoint succeeding.
Restore = the latest full backup + the change-sets after it, replayed in index order (latest page image wins). Restore is keyed on segment index, not WAL offset, so it spans WAL resets and incremental-checkpoint boundaries. It never reads the live database's checkpoint state.

The per-database manifest is an advisory, rebuildable cache, not a source of truth — retention and restore reconcile by listing the objects on the backend.

Leveled compaction

A long-lived backup accumulates many small change-sets. bakelite consolidates them with leveled, time-windowed compaction: a contiguous run of level-(N−1) change-sets is promoted into one level-N file once it spans that level's window (compaction_levels = ["30s", "5m", "1h"] by default). Promoting deletes the lower-level inputs, so storage and the per-restore object count stay bounded. The most recent compaction_keep_recent change-sets are held back from promotion, so recent point-in-time restore stays fine-grained. The merge carries the lineage chain endpoints (see below) across promotion, so the chain is preserved exactly without rewriting the tail.

Per-object and lineage integrity

Every stored object carries a CRC-32C envelope (catches bit-rot at decode time) and a BLAKE3 object_hash of its exact stored bytes — a much stronger check that'll catch a substituted object, not just a flipped bit. On top of that, change-sets form a BLAKE3 hash chain rooted at the base snapshot: each carries parent_hash (the chain value at its start boundary) and content_hash (at its end boundary), so a missing, reordered, or substituted change-set is detectable. bakelite verify checks all of this; bakelite verify --deep also does a full restore into a temp file and runs PRAGMA integrity_check.

The replication loop

bakelite holds a long-running read lock so the WAL is append-only with stable salts while it ships segments incrementally, which stops a checkpoint from racing the WAL it's reading.

To bound WAL growth it performs an incremental checkpoint: ship all frames, freeze writers, then in one tight control-connection operation release the write lock, run wal_checkpoint(TRUNCATE), and re-read PRAGMA data_version. If data_version changed (a concurrent commit slipped into the truncate window) or the WAL didn't reset, it falls back to a full overlay snapshot — a new full backup at the next index, always safe. Otherwise it keeps shipping change-sets from the fresh WAL.

A full-snapshot rotation happens once per snapshot_interval, which also bounds restore-chain length and lets retention drop old objects.

bakelite is crash-safe: the replication cursor is persisted after each segment is durable, so a restarted daemon resumes from the cursor rather than re-snapshotting.

When things go wrong

Replication keeps running across the failures that actually happen in production, and none of them risk your committed data:

The backend is unreachable. A sync error — S3 down, a network blip, throttling — is never fatal to the daemon. It logs the error, marks the database backing off in bakelite status, and retries with exponential backoff (1s, doubling, capped at 5 minutes). Throughout, bakelite holds its read lock, so SQLite can't checkpoint past frames that haven't been shipped yet — committed data is never dropped to make room. The un-shipped commits wait in the database's own -wal file on disk, not in bakelite's memory, so the daemon's RSS stays flat (~tens of MB) however long the outage runs. The real cost of a long outage is a growing WAL on disk, not runaway RAM; when the backend returns, bakelite ships the backlog and checkpoints the WAL back down. Your application keeps writing the whole time — bakelite never blocks your writers.

The filesystem watcher misses an event. OS change notifications can be dropped or coalesced, and bakelite doesn't depend on catching every one. The safety_poll backstop (30s by default) means a missed event delays a sync by at most that interval rather than stranding it. In normal operation your RPO is set by max_batch_delay (~1s); safety_poll is only the worst-case floor.

The daemon or the host crashes. The replication cursor is persisted locally after each change-set is durable on the backend, so a restart resumes from the cursor rather than re-uploading from scratch. If a crash lands between "change-set uploaded" and "cursor persisted", the worst case is re-shipping one change-set — which is idempotent (same index, same bytes), never lost data — and anything committed but not yet shipped is still in the -wal, picked up on resume. The manifest is only an advisory cache, so a crash before it's refreshed costs nothing: restore and retention reconcile by listing the objects, and verify --deep rebuilds it.

On-disk format

Per database, on the backend:

databases/<db>/manifest.json                               -> advisory, rebuildable cache
databases/<db>/snapshots/<NNNN>.snap
databases/<db>/segments/L<NN>/<start>-<end>.seg

Snapshots and segments share one global, lifetime-wide index space. Segments are partitioned by compaction level (L00 = raw, higher levels = merged), and each filename carries the inclusive raw-index range the object covers (e.g. 0000000005-0000000030.seg for an L02 merge of raw segments 5–30) — indices zero-padded so a lexical listing is already chronological. The manifest.json is an advisory cache the daemon refreshes; the objects themselves are the source of truth, reconciled by listing. Snapshots and segments are page-level, zstd-compressed (configurable), and CRC-32C checked inside a small container format (a magic + compression + CRC envelope wrapping the page images).

A bug worth calling out

On Unix, close() releases all of a process's POSIX advisory locks on a file. Reading the database file through a transient File::open/drop therefore silently drops SQLite's reader lock, letting another connection checkpoint the WAL out from under us — silent data loss.

bakelite reads the database file only through a single long-lived file descriptor via positioned reads (pread). A dedicated regression test (tests/wal_pinning.rs) guards against ever reintroducing a per-read open of the database file.