Configuration
bakelite is driven by a single TOML file. A [defaults] table applies to every
database; any field can be overridden inline inside a [[database]] block.
A minimal config
[[database]]
name = "app"
path = "/var/lib/app/app.db"
[[database.backends]]
type = "file"
path = "/var/backups/bakelite"
That's enough to run. Every tunable below has a sensible default, so you can leave them alone until you've got a reason not to — reach for one when you want to change a specific behaviour.
Defaults & tunables
[defaults]
debounce = "250ms" # coalesce write bursts into one sync
max_batch_delay = "1s" # force a flush at least this often (~RPO)
safety_poll = "30s" # cheap fallback poll; NOT a busy loop
busy_timeout = "5s"
max_wal_size = "16MiB" # trigger a checkpoint when the WAL grows past this
manifest_flush = "10s" # how often to flush the manifest index
snapshot_interval = "1d" # daily full snapshot
retention = "0s" # 0s = keep all backups; "7d" = 7 days
compaction_levels = ["30s", "5m", "1h"] # consolidation windows per level; [] disables
compaction_keep_recent = 16 # keep this many recent change-sets fine-grained for point-in-time restore
max_segments_per_snapshot = 0 # 0 = disabled; new full backup after this many change-sets to bound restore-chain length
max_total_size = 0 # 0 = no cap; e.g. "5GiB" to bound stored bytes per db
max_backups = 0 # 0 = no cap; e.g. 30 to bound retained backups
multipart_sweep_interval = "1h" # S3: how often to abort orphaned multipart uploads; "0s" disables
multipart_min_age = "1h" # S3: only abort uploads older than this (protects in-flight uploads)
mirror_interval = "60s" # how often background async mirrors reconcile (see Multi-destination)
compression = "zstd" # "zstd" | "lz4" | "none"
validate_on_restore = true # multi-destination: validate objects and fall back to a healthy copy
on_incomplete_destination = "backfill" # multi-destination: backfill | refuse | warn (see Multi-destination)| Key | Meaning |
|---|---|
debounce | Coalesce a burst of writes into a single sync. |
max_batch_delay | Upper bound on how long a write waits before being shipped — effectively your RPO. |
safety_poll | A cheap fallback poll in case a filesystem event is missed. Not a busy loop. |
busy_timeout | SQLite busy timeout for the control connection. |
max_wal_size | When the WAL crosses this, bakelite ships everything and TRUNCATEs the WAL (incremental checkpoint). Accepts a unit string like 16MiB / 512kB or a raw byte count. |
manifest_flush | How often to flush the manifest index to the backend. The stored change-set objects are the source of truth, so restore/resume reconcile a lagging manifest by listing; lower = less work to recover after a crash, higher = fewer backend writes. Always flushed at snapshot, checkpoint, compaction, and graceful shutdown. |
snapshot_interval | How often to take a fresh full backup. Shorter intervals keep point-in-time restore fast and memory-light; longer ones reduce storage overhead. |
retention | Prune backups older than this window. "0s" keeps everything; the current backup is never pruned. |
compaction_levels | Consolidation windows, one per level (e.g. ["30s", "5m", "1h"] = merge into 30s windows, then 5m, then 1h). bakelite automatically merges older incremental change-sets into coarser windows as they age, so storage and per-restore object count stay bounded while recent restore stays fine-grained. [] disables. Windows must strictly increase. |
compaction_keep_recent | Keep this many recent incremental change-sets un-merged so recent point-in-time restore stays precise. |
max_segments_per_snapshot | Take a fresh full backup once this many incremental change-sets have shipped since the last one, bounding restore-chain length even on a near-idle database whose snapshot_interval rarely fires. 0 disables (re-snapshotting stays purely time-/size-driven). |
max_total_size | Ceiling on total stored bytes per database (the bill-shock guard). When a new backup pushes the replica past this, bakelite prunes the oldest backups to get back under — never the current one. 0 disables. Accepts a unit string like 5GiB. |
max_backups | Cap on retained backups per database; excess oldest backups are pruned (never the current one). 0 disables. |
multipart_sweep_interval | S3 only: how often the daemon aborts orphaned incomplete multipart uploads (snapshot uploads a crash left unfinalized). "0s" disables. No-op on non-S3 backends and on hosts without static AWS env credentials. See S3 storage overhead. |
multipart_min_age | S3 only: only abort multipart uploads at least this old, so an in-flight snapshot upload is never aborted by the sweep. bakelite reclaim --min-age overrides per invocation. |
mirror_interval | How often the background reconciler copies new objects to each async mirror destination (Multi-destination → Async mirrors). No-op when a database has no async destinations. |
compression | Page compression: zstd, lz4, or none. |
validate_on_restore | When a database has 2+ destinations, validate every object on restore/verify/list and transparently fall through to a healthy copy on another destination if one has bit-rotted. Default true; free for a single-destination database (it's only built when there's a sibling to fall back to). See Redundancy & bit-rot recovery. |
on_incomplete_destination | Multi-destination only: what to do on resume when a destination is missing part of the current backup chain — typically a fan-out destination you added or repointed at a new location after replication began. backfill (default) copies the existing chain onto it so it's immediately restorable; refuse stops the database until you resolve it (fail-safe); warn logs and keeps going. No effect with a single destination. See Adding or moving a destination. |
Duration fields accept a unit-suffixed string: ms, s (or sec/secs),
m (or min/mins), h (or hr/hrs), d (or day/days) — e.g.
"250ms", "30s", "5m", "1h", "7d". Bare integers are
deliberately rejected so a missing unit can't silently mean the wrong thing
(250 interpreted as ms vs s is a 1000× footgun).
Byte-size fields accept either a raw integer (bytes) or a unit string: SI
units (kB, MB, GB) are 1000-based and IEC units (KiB, MiB, GiB) are
1024-based.
Every tunable can be set under [defaults] or overridden per [[database]]:
[[database]]
name = "analytics"
path = "/data/analytics.db"
safety_poll = "15s" # per-database overrideDaemon tuning
Process-wide knobs that affect the daemon itself, not any one database. The
matching CLI flags (e.g. bakelite daemon --max-concurrent-snapshots 1)
override these for a single invocation; otherwise the daemon picks up the
configured value, otherwise its built-in default.
[daemon]
max_concurrent_snapshots = 4 # 0 (or absent) = auto: one at a time
snapshot_workers = 2 # 0 (or absent) = auto: single-threaded
metrics_addr = "127.0.0.1:9090" # omit to disable the endpoint| Key | Meaning |
|---|---|
max_concurrent_snapshots | Cap on databases snapshotting concurrently across the whole daemon. Defaults to one at a time for a low, steady footprint; raise it to bootstrap many databases faster on a host with spare cores (at a sharper CPU/IO spike). |
snapshot_workers | zstd worker threads per snapshot. Defaults to single-threaded (lowest memory); larger values speed up one big database's snapshot in exchange for more RSS. |
metrics_addr | Prometheus scrape endpoint (host:port). See bakelite daemon's --metrics-addr for the exposed series. |
Setting any of these in the config means the systemd unit can stay generic —
no ExecStart override needed when you re-tune for a new host.
Storage limits & usage
Object stores have no natural ceiling, so a backup that just keeps growing can quietly run up a bill — even cheap storage adds up with what you keep. bakelite has a couple of limits for that, plus a way to see what's actually stored.
Limits (max_total_size, max_backups,
all opt-in). When a new backup would push a database past a cap, bakelite
prunes the oldest backups until it's back under — and it never deletes the
current backup. If a cap still can't be met (e.g. a single database is larger
than max_total_size), bakelite logs a warning and the usage command flags the
database, rather than throwing away your only restorable copy.
bakelite usage reports what's actually stored:
bakelite usage # one line per database: total, backups, age
bakelite usage --db app # per-backup breakdown (full vs incremental)
bakelite usage --db app --json # machine-readable, for monitoring/alerting
When limits are configured, the output shows usage against the cap
(146 KiB / 1.00 MiB (14%)) and flags any database that is over.
Backends
Local filesystem
[[database.backends]]
type = "file"
path = "/var/backups/bakelite"S3-compatible
Works with AWS S3, Cloudflare R2, Backblaze B2, MinIO, and others — see
Backends for the per-provider compatibility matrix and how
each is tested. Credentials come from the environment (AWS_ACCESS_KEY_ID /
AWS_SECRET_ACCESS_KEY).
bakelite loads /etc/bakelite/bakelite.env at startup automatically (the same
file systemd reads, also picked up by interactive CLI invocations) — see
Environment variables for the full list and load
order. If no credentials are reachable, the command fails immediately with a
clear "no S3 credentials found" error.
On EC2/ECS/EKS? If you rely on an instance profile, task role, or pod identity instead of static keys, set
BAKELITE_AWS_USE_DEFAULT_CREDENTIAL_CHAIN=1to opt into the AWS default credential chain (including the instance-metadata service). Without it, bakelite refuses to probe instance metadata, so a missing key on a non-AWS host fails fast instead of timing out against169.254.169.254.
[[database.backends]]
type = "s3"
bucket = "my-bucket"
prefix = "bakelite"
endpoint = "https://ACCOUNT.r2.cloudflarestorage.com"
region = "auto"
# force_path_style = true # default when `endpoint` is set
# request_timeout = "30s" # cap on a single request (also on gcs/azure)
# max_retries = 10 # retry attempts per request (also on gcs/azure)
With a custom endpoint, path-style addressing is the default (required by
B2 and MinIO), and region must match the service.
request_timeout and max_retries (the latter on the object-store backends —
S3, GCS, Azure) tune the underlying client. Leave them unset to keep
object_store's defaults, which already retry with exponential backoff and time
out requests; set them only to tighten or loosen behaviour for a particular link.
IAM permissions
bakelite needs to read, write, and delete objects, plus a handful of bucket-level
list/inspect actions. It never creates the bucket, changes its configuration, or
deletes object versions. This least-privilege policy (attach it to the user or
group whose credentials bakelite runs as) covers everything — replication,
multipart-upload reclaim, and bakelite usage reporting:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BakeliteObjectRW",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Sid": "BakeliteBucketList",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:GetBucketVersioning",
"s3:ListBucketVersions",
"s3:GetBucketObjectLockConfiguration"
],
"Resource": "arn:aws:s3:::my-bucket"
}
]
}| What bakelite does | S3 action(s) |
|---|---|
Read/write/delete snapshots, segments, the manifest cache, CURRENT (large snapshots stream as multipart) | GetObject, PutObject, DeleteObject |
| List snapshots and segments | ListBucket |
| Sweep and abort orphaned multipart uploads | ListBucketMultipartUploads, AbortMultipartUpload |
bakelite usage version/overhead reporting | GetBucketVersioning, ListBucketVersions |
bakelite doctor Object Lock verification (expect_object_lock) | GetBucketObjectLockConfiguration |
A few things to note:
- Object vs. bucket ARNs. The object actions target
my-bucket/*; the bucket-level list/inspect actions targetmy-bucket(no/*). They have to be split across the two resources — a bucket action on a/*resource (or the reverse) silently never matches. - Deliberately absent. No
s3:CreateBucket/s3:PutBucket*, nos3:DeleteObjectVersion, and nos3:BypassGovernanceRetention. Expiring noncurrent versions is the bucket owner's job, not bakelite's — see the versioning discussion below for why — and withholding the bypass permission means bakelite can't shorten an Object Lock retention even if its credentials are compromised. - Tightening to a prefix. On a bucket shared with other workloads, narrow the
object ARN to
my-bucket/bakelite/*(matching theprefixin the TOML above) and optionally add"Condition": {"StringLike": {"s3:prefix": ["bakelite/*"]}}to theListBucketstatement. Leave the multipart and versioning actions bucket-wide — bakelite filters those client-side, so they aren't reliably prefix-scopable. See Shared backend for many databases. - Reporting needs static env credentials. The
GetBucketVersioning/ListBucketVersionscalls run on static AWS env credentials (AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY), so on IAM-role / instance-metadata hosts they no-op andbakelite usagereports the overhead as "not inspected" regardless of this policy. - SSE-KMS adds key permissions. If you set
sse = "aws:kms"(above), the credentials also needkms:GenerateDataKey(to write) andkms:Decrypt(to read/restore) on the configured key — granted on the KMS key's policy, not in the bucket policy above.
S3 storage overhead a plain listing can't see
A normal object listing — and therefore the storage bakelite's limits act on — counts only the current version of each object. Two kinds of cost hide behind it on S3:
-
Incomplete multipart uploads. Snapshots stream to S3 as multipart uploads; a crash mid-upload can leave the parts behind — billed, but invisible to a plain listing. bakelite cleans these up: a failed upload aborts at source, and the daemon periodically sweeps any left by a hard crash (
multipart_sweep_interval, touching only uploads older thanmultipart_min_ageso an in-flight upload is never aborted).bakelite reclaim --db <name>does the same on demand (--dry-runto preview,--min-age 0sto force-abort everything — best with the daemon stopped). This needs onlys3:AbortMultipartUpload/s3:ListBucketMultipartUploads. -
Noncurrent versions and delete markers. On a versioned bucket, every overwrite and delete leaves a noncurrent version (or delete marker) behind that still costs money. bakelite deliberately does not touch these — it has no way to delete an object version, by design. Versioning is a disaster-recovery boundary: you enable it so your backups can't be wiped, including by bakelite or by anything that compromises its credentials. Letting bakelite expire versions would mean handing it
s3:DeleteObjectVersion, the very permission that defeats the guarantee. Expiring noncurrent versions is therefore the bucket owner's job, via a server-side lifecycle policy that runs with the bucket's own authority:
{
"Rules": [
{
"ID": "bakelite-expire-noncurrent",
"Status": "Enabled",
"Filter": { "Prefix": "" },
"NoncurrentVersionExpiration": { "NoncurrentDays": 7 },
"AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 1 }
}
]
}aws s3api put-bucket-lifecycle-configuration \
--bucket my-bucket --lifecycle-configuration file://lifecycle.json
# MinIO: mc ilm rule add --noncurrent-expire-days 7 myalias/my-bucket
Backblaze B2 models lifecycle natively rather than through the S3 rules above:
each rule has daysFromHidingToDeleting (B2's NoncurrentDays equivalent — purge a
version this many days after it's superseded or deleted) and daysFromUploadingToHiding
(leave this null so B2 never hides a current object on you). The quickest path is
the bucket's Lifecycle Settings → "Keep only the last version of the file" in the
web console (that preset is exactly daysFromHidingToDeleting: 1); or set it via the
B2 API / b2 CLI with a rule like:
{ "fileNamePrefix": "", "daysFromUploadingToHiding": null, "daysFromHidingToDeleting": 7 }
Without this, B2 keeps every superseded version forever — and bakelite churns the
per-database manifest object frequently, so a tiny live replica can hide gigabytes of
noncurrent versions. bakelite usage flags it; the rule bounds it.
A caveat for B2 keys. The "even a compromised credential can't wipe your versions"
guarantee above holds on S3, where you grant bakelite s3:DeleteObject but withhold
s3:DeleteObjectVersion. It does not hold on Backblaze B2: B2's deleteFiles
capability — which bakelite needs for compaction and retention — also permits
permanently deleting a specific version, and B2's web console only mints broad
Read-Only / Read-Write keys (granular capabilities aren't exposed in the UI). So a
console-created B2 key, or a host that leaks it, can delete noncurrent versions —
lifecycle rule or not. Two things help:
-
Mint a narrower key with the
b2CLI (the UI won't): restrict it to the one bucket and the capabilities bakelite actually uses — this drops the bucket-admin and file-sharing powers a console key bundles in:b2 account authorize <MASTER_KEY_ID> <MASTER_APP_KEY> b2 key create --bucket my-bucket bakelite \ listFiles,readFiles,writeFiles,deleteFiles,readBuckets,readBucketRetentions,readBucketLifecycleRulesdeleteFilesstill has to stay (compaction and retention delete objects), so this shrinks the blast radius but can't make versions un-deletable. -
For a boundary that survives credential compromise, use Object Lock (below): the bucket enforces it server-side, regardless of what the key is allowed to do.
The AbortIncompleteMultipartUpload rule above is still worth keeping as a
backstop: it's free, runs continuously even when the daemon is down, and covers
hosts where bakelite's own sweep can't run — multipart and version
inspection (and the daemon sweep) use static AWS env credentials
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY), so on IAM-role / instance-metadata
hosts they no-op and report "not inspected".
bakelite usage --db <name> quantifies all of this — noncurrent-version bytes,
delete-marker counts, and incomplete multipart uploads — so the hidden cost stays
visible. With the lifecycle rule in place and bakelite's sweep running, bakelite usage should report the overhead trending to zero.
Immutable backups with Object Lock (WORM)
Object Lock makes backups that can't be deleted — not with stolen credentials, and not by bakelite itself — which protects them against accidental deletion or a credential compromise. On S3 that's Object Lock: enable it on the bucket with a default retention, and every object bakelite writes is locked immutable for that window.
This works because bakelite only ever adds data objects — snapshots and
change-sets are written once and never deleted or overwritten in place (the tiny
CURRENT pointer and manifest are updated as new versions, which a versioned,
locked bucket allows) — so it runs against a locked bucket unchanged. It also
can't weaken the lock: it never deletes object versions and is never
granted s3:BypassGovernanceRetention, so even a compromised daemon can't shorten
the retention.
bakelite never configures Object Lock itself — set it on the bucket (it must be enabled at creation, alongside versioning):
aws s3api create-bucket --bucket my-bucket --object-lock-enabled-for-bucket #...
# A bucket-wide default retention then locks every new object:
aws s3api put-object-lock-configuration --bucket my-bucket \
--object-lock-configuration \
'{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"COMPLIANCE","Days":30}}}'[[database.backends]]
type = "s3"
bucket = "my-bucket"
expect_object_lock = true # bakelite doctor fails if the bucket isn't locked
[defaults]
retention = "0s" # don't try to prune — Object Lock denies the deletes
compaction_levels = [] # don't rewrite/merge — same reason
Turn off deletion-based maintenance. Object Lock blocks deletes, so bakelite's
own pruning can't run while objects are locked. If you leave retention /
compaction enabled against a locked bucket, bakelite degrades gracefully rather
than crashing — pruning is skipped (and logged), and compaction churns (it
writes merged objects it then can't clean up). bakelite doctor warns about exactly
this. Let Object Lock plus a bucket lifecycle policy govern expiry instead.
Set checksum = true (see above):
an Object-Lock bucket with retention rejects any upload that doesn't carry a
checksum — single-part PutObject and multipart alike — so checksum = true is
required for Object Lock regardless of database size, not just for large snapshots.
expect_object_lock = true turns immutability into a checked invariant:
bakelite doctor reports the lock mode and retention window and fails if the
bucket isn't actually locked, so you can gate a deploy on it. Verification needs the
s3:GetBucketObjectLockConfiguration permission (in the policy above) and the same
static AWS env credentials as the version-overhead inspector — without either,
doctor reports it as unverifiable (a warning) rather than failing, even on a locked
bucket. It's checked against the primary (first) backend.
COMPLIANCE mode can't be shortened or removed by anyone — including the root
account — until each object's retention expires; GOVERNANCE mode allows a
privileged override. For a backup you never want weakened, COMPLIANCE is stronger,
but it's irreversible: size the retention window deliberately.
Cheaper cold storage with storage classes
Most of a replica's bytes are cold: the base snapshots and the older, compacted change-sets are rarely read — restore usually only touches the most recent ones. Parking that cold bulk in a cheaper storage class can cut the storage bill substantially, while recent change-sets stay hot for fast restore.
[[database.backends]]
type = "s3"
bucket = "my-bucket"
storage_class = "STANDARD_IA" # or GLACIER_IR, INTELLIGENT_TIERING, …
bakelite applies storage_class to the cold objects — base snapshots and
compacted (level ≥ 1) segments — and leaves hot data (recent raw change-sets, the
manifest, and the CURRENT pointer) in the bucket default. You don't choose
per-object; bakelite routes by kind.
Instant-retrieval classes only. Restore reads objects immediately, with no thaw step, so only classes that are instantly readable are allowed:
| Provider | Allowed (instant) | Rejected (needs a thaw) |
|---|---|---|
| S3 | STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER_IR | GLACIER (Flexible), DEEP_ARCHIVE |
| GCS | NEARLINE, COLDLINE | — |
| Azure | Cool, Cold | Archive |
A thaw-requiring tier — S3 GLACIER (Flexible) / DEEP_ARCHIVE, or Azure
Archive — is rejected at config load with a clear error: those need an
asynchronous restore/rehydrate (minutes to hours) before an object can be read,
which would break bakelite restore. The check is provider-aware and exact, so
GLACIER_IR (Glacier Instant Retrieval) is allowed, and so is GCS ARCHIVE —
every GCS class, Archive included, is served instantly with no rehydration.
(Automated thaw-on-restore for the archival tiers is a planned future feature.)
Cost nuance. Infrequent-access and cold tiers trade a lower per-GB storage price for per-request retrieval fees and minimum-storage-duration / early-deletion charges. They pay off when snapshots are large and restores are rare; pair with a longer
snapshot_intervalso you're not re-uploading cold full snapshots often. They're not free for hot, churny data — which is why bakelite keeps recent change-sets in the default class.
Server-side encryption and upload integrity
Two optional S3 data-protection knobs, independent of bakelite's client-side encryption:
[[database.backends]]
type = "s3"
bucket = "my-bucket"
checksum = true # S3 validates a SHA-256 on every upload
sse = "aws:kms" # server-side at-rest encryption (or "aes256")
sse_kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abcd-…"
sse turns on server-side at-rest encryption, managed by the provider:
"aes256" is SSE-S3 (S3-managed keys, zero extra config); "aws:kms" is
SSE-KMS with your customer-managed key (sse_kms_key_id required — and the
credentials need kms:GenerateDataKey + kms:Decrypt on that key, the latter both
at restore and for the multipart snapshot uploads bakelite performs). This is
orthogonal to bakelite's own client-side encryption, which keeps the provider
from ever seeing plaintext (stronger for confidentiality), while SSE is the
familiar compliance checkbox and protects the at-rest copy if you don't encrypt
client-side. Use either, both, or neither.
checksum = true has bakelite compute a SHA-256 for each upload and send it so
S3 validates object integrity at write time — catching client→S3 corruption
immediately, on top of bakelite's own CRC envelope (which catches corruption on
read). It's also required by S3 Object Lock:
a bucket with retention rejects any upload — single-part or multipart — that lacks
a checksum, so set checksum = true whenever you use Object Lock, whatever the
database size.
Both are S3-only; GCS and Azure encrypt at rest by default and aren't configured here.
Google Cloud Storage
[[database.backends]]
type = "gcs"
bucket = "my-gcs-bucket"
prefix = "bakelite"
# service_account_path = "/etc/bakelite/gcs-key.json" # off-GCP: explicit key
Credentials are resolved by object_store the way the Google tooling expects:
a service-account file or JSON key in the environment (GOOGLE_SERVICE_ACCOUNT /
GOOGLE_SERVICE_ACCOUNT_KEY / GOOGLE_APPLICATION_CREDENTIALS), the gcloud
Application Default Credentials file, or the GCE metadata server when
running on Google Cloud. Off-GCP, point service_account_path at a downloaded
service-account JSON key instead. The service account needs object read/write/list
and delete on the bucket (roughly the roles/storage.objectAdmin role, or a custom
role with storage.objects.{get,create,delete,list}).
Verified against a live GCS bucket. Beyond the shared backend conformance suite, the native GCS backend is exercised end-to-end against a real Google Cloud Storage bucket by the opt-in
gcs-azure-providers.ymlCI leg (thegcs_conformanceentrypoint). Authenticate with a service-account key, or with Application Default Credentials (gcloud auth application-default login) for an off-GCP host.
Azure Blob Storage
[[database.backends]]
type = "azure"
account = "mystorageacct" # storage account name
container = "backups" # the bucket equivalent
prefix = "bakelite"
The credential is read from the environment so it never sits in the TOML: set
AZURE_STORAGE_ACCOUNT_KEY (or its alias AZURE_STORAGE_ACCESS_KEY) to the
account key, a SAS token, or service-principal variables
(AZURE_STORAGE_CLIENT_ID / AZURE_STORAGE_CLIENT_SECRET /
AZURE_STORAGE_TENANT_ID). bakelite loads /etc/bakelite/bakelite.env at startup
(see Environment variables), so the same file that holds
your other secrets works here. The credential needs blob read/write/list/delete on
the container (the Storage Blob Data Contributor role, or an equivalent SAS).
Only the public-cloud endpoint (<account>.blob.core.windows.net) is targeted; a
custom endpoint for the Azurite emulator or sovereign clouds isn't exposed yet.
Verified against a live Azure account. Beyond the shared backend conformance suite, the native Azure backend is exercised end-to-end against a real Azure Storage account by the opt-in
gcs-azure-providers.ymlCI leg (theazure_conformanceentrypoint). The credential is read from the environment — account key, SAS token, or service principal.
GCS and Azure share the same storage layer as S3. The S3-only extras —
bakelite usagenoncurrent-version reporting and the multipart-upload reclaim sweep — don't apply to them; expire old data with the provider's own lifecycle/versioning controls. See Backends for the compatibility matrix.
SFTP
Back up to any SSH host. The transport is pure Rust (russh
russh-sftp) — there's no systemsshbinary and no OpenSSL/libssh2 C dependency, so the single static binary keeps working.
[[database.backends]]
type = "sftp"
host = "backup.example.com"
# port = 22
user = "bakelite"
path = "/srv/backups/bakelite" # remote base dir (relative to login dir if not absolute)
identity_file = "/etc/bakelite/.ssh/id_ed25519"
# passphrase_env = "SFTP_KEY_PASS" # env var holding the key's passphrase, if encrypted
# password_env = "SFTP_PASSWORD" # env var holding the password (keeps it out of the config)
# password = "..." # last resort: inline password (plaintext in the config)
# known_hosts = "/etc/bakelite/known_hosts"
# insecure_skip_host_key_check = false
# connect_timeout = "30s"
# request_timeout = "30s"
- Authentication is by SSH key (
identity_file, recommended) orpassword. Secrets stay out of the config: an encrypted key's passphrase and the password both come from the environment, the same way the S3/Azure credentials do — sobakelite.tomlcan be checked in.- Key auth — point
identity_fileat the private key. If it's encrypted, supply the passphrase viapassphrase_env(names the env var to read) or the fixedBAKELITE_SFTP_KEY_PASSPHRASE; a passphrase-less key needs neither. - Password auth — set
password_envto the name of an env var holding the password, or rely on the fixedBAKELITE_SFTP_PASSWORD. Put the value inbakelite.env(chmod 600) like any other secret. The inlinepasswordfield still works as a last resort, but it sits in plaintext in the config — prefer the environment. (When more than one source is set, the most specific wins:password_envover inlinepasswordoverBAKELITE_SFTP_PASSWORD.)
- Key auth — point
- Reconnects on drop. The session is established lazily and reused; if it
drops (server restart, network blip, idle timeout), the next operation transparently
reconnects and retries rather than wedging until the daemon restarts.
connect_timeout(handshake + auth + subsystem) andrequest_timeout(a single request's response) bound how long a dead link can hang — both default to30s. - Host-key verification is on by default: the server's key is checked against
known_hosts(default~/.ssh/known_hosts). The host must already have an entry — add one withssh-keyscan backup.example.com >> ~/.ssh/known_hosts, or connect once withssh. If the key isn't found (or has changed), bakelite refuses to connect. Setinsecure_skip_host_key_check = trueto skip the check — convenient on a trusted LAN, but it disables MITM protection, so never use it over the open internet. - It's the local-filesystem backend over a network: writes go to a temp file
then rename into place, and the same
databases/...tree is created underpath. The S3-only extras (version inspection, multipart reclaim) don't apply.
Conformance-tested. The SFTP backend passes the same
Backend-trait suite as every other backend, against a disposableatmoz/sftpserver — run it withjust sftp-up && just test-sftp.
Shared backend for many databases
Backing up many databases to the same destination? Configure the backend once
as a top-level [[backends]] and list databases with just name + path — objects
are namespaced by database name automatically. A per-database [[database.backends]]
still overrides the shared one.
[[backends]]
type = "s3"
bucket = "my-bucket"
prefix = "bakelite"
endpoint = "https://s3.us-west-001.backblazeb2.com"
region = "us-west-001"
[[database]]
name = "user1"
path = "/var/lib/rnd/users/1/data.db"
[[database]]
name = "user2"
path = "/var/lib/rnd/users/2/data.db"Environment variables
Some settings live in the environment rather than the TOML — secrets (so they don't end up in a config that gets checked in), and a few process-wide knobs that don't belong per-database. bakelite reads them itself, so they work the same way under systemd and from an interactive shell.
Where they come from
At startup, bakelite loads variables from the first available source per variable — already-set environment variables always win:
$BAKELITE_ENV_FILEif set (explicit; missing path is a hard error)./etc/bakelite/bakelite.env(silently skipped if absent).$XDG_CONFIG_HOME/bakelite/bakelite.env(or~/.config/bakelite/bakelite.env).
Under systemd, the EnvironmentFile=-/etc/bakelite/bakelite.env line in the
unit pre-populates the daemon's environment; the in-process loader is a no-op
there because everything is already set. For interactive CLI use, the loader
removes the need to source the file yourself.
Use bakelite doctor to see which files were considered and which variables
are set (secret values are redacted).
The variables
| Variable | Used by | Purpose |
|---|---|---|
BAKELITE_CONFIG | CLI | Path to the config TOML. Overridden by --config. |
BAKELITE_LOG | CLI/daemon | Tracing filter, e.g. info, bakelite=debug. |
BAKELITE_ENV_FILE | CLI/daemon | Explicit env file to load before all other discovery. |
AWS_ACCESS_KEY_ID | S3 backend | Static access key. |
AWS_SECRET_ACCESS_KEY | S3 backend | Static secret key. |
AWS_SESSION_TOKEN | S3 backend | Temporary STS token, if you're using one. |
AWS_DEFAULT_REGION | S3 backend | Default region; per-backend region in TOML wins. |
BAKELITE_AWS_USE_DEFAULT_CREDENTIAL_CHAIN | S3 backend | Set to 1 to opt into the AWS default credential chain (IMDS/ECS/EKS). Off by default so a missing key on a non-AWS host fails fast instead of timing out. |
GOOGLE_SERVICE_ACCOUNT / GOOGLE_SERVICE_ACCOUNT_KEY / GOOGLE_APPLICATION_CREDENTIALS | GCS backend | Service-account file path, inline JSON key, or ADC file path. Or set service_account_path in the TOML. |
AZURE_STORAGE_ACCOUNT_KEY / AZURE_STORAGE_ACCESS_KEY | Azure backend | Storage-account key (the two names are aliases). SAS / service-principal variables (AZURE_STORAGE_CLIENT_ID / _SECRET / _TENANT_ID) are also honoured. |
BAKELITE_SFTP_PASSWORD | SFTP backend | Password for password auth. Used when a backend's password_env is unset; a backend's own password_env (naming a different variable) takes precedence. |
BAKELITE_SFTP_KEY_PASSPHRASE | SFTP backend | Passphrase for an encrypted identity_file. Used when a backend's passphrase_env is unset. |
BAKELITE_KEY | encryption | Inline BAKELITE-KEY-V1-… key — overrides [encryption] in the config. |
BAKELITE_KEY_FILE | encryption | Path to a key file — overrides [encryption]. |
BAKELITE_PASSPHRASE | encryption | Inline passphrase — overrides [encryption]. |
BAKELITE_PASSPHRASE_FILE | encryption | Path to a passphrase file — overrides [encryption]. |
At-rest encryption
Optional. When set, every snapshot / change-set / manifest payload is encrypted with XChaCha20-Poly1305 (an authenticated AEAD cipher) before being uploaded; object keys, the current-backup pointer, and listings stay in the clear — they carry only the information needed to locate and route objects, never database contents.
bakelite uses a symmetric key model: one key encrypts and decrypts, held by the
daemon and by every CLI command (restore, verify, compact, list, usage).
This protects backups against a stolen disk or an S3-bucket breach, and against
inspection in transit; it does not try to protect them from a compromised daemon
host (which already has the live plaintext database).
Each object is encrypted independently and bound to its identity — its database name and position in the replica travel as the AEAD's associated data, so the backend can't relocate, swap, or roll back an object without the authentication failing on read. The cipher authenticates the writer on every object; there's no separate manifest MAC to maintain.
What encryption does and does not protect
| Threat | Protected? |
|---|---|
| Passive read of the backend (stolen disk, bucket breach, on-the-wire snoop) | Yes — payloads are confidential; only the key decrypts them. |
| Active write to the backend (compromised bucket creds, MITM, malicious provider) — content | Yes (with require_encrypted, the default). Forged/substituted objects are rejected: plaintext is refused outright, and because each object is bound to its identity (database + position) as AEAD associated data, an attacker can't forge one, relocate it, or swap one object for another without decryption failing. restore/verify also bind every object to its recorded hash. |
| Active write — rollback / deletion | Partly. An attacker can't forge data, but can still roll the replica back to a genuine earlier state (by repointing CURRENT) or delete backups (an availability attack). verify warns when CURRENT doesn't point at the newest full backup, and restore --timestamp <recent> resolves by backup time independently of CURRENT — so a repointed pointer can't fool it (it can't recover deleted backups, though). |
| Compromised daemon host | Out of scope — it already holds the live plaintext database. |
What encryption gives you is confidentiality plus, with require_encrypted,
tamper-evidence for content. The trust anchor is the AEAD itself: only a holder
of the key can produce an object that authenticates, and each object is bound to its
identity (database + position), so it can't be forged, swapped, or relocated. The CRC
envelope + BLAKE3 object_hash then also catch accidental corruption (bit-rot,
truncation — the common failure mode). What still leaks even with encryption on:
database names, object sizes/counts, and write cadence are visible in the
cleartext object keys and listings (routing metadata).
require_encrypted(defaulttrue). Reads reject any object that isn't encrypted, closing the downgrade where an attacker swaps ciphertext for attacker-chosen plaintext. Set it tofalseonly while migrating a previously-plaintext replica (see Toggling or rotating).
Transport: an
endpointbeginning withhttp://disables TLS — credentials (the access-key id and any session token) and all metadata then travel in the clear and signed requests are replayable. Usehttps://in production; plainhttp://is only for local emulators (MinIO/LocalStack on loopback).
# One-time: generate a key (mode 0600, refuses to overwrite).
bakelite keygen --output /etc/bakelite/key.txt
# Produces a BAKELITE-KEY-V1-… file.# Apply to every database:
[defaults.encryption]
key_file = "/etc/bakelite/key.txt"
# …or per-database:
[[database]]
name = "secrets"
path = "/data/secrets.db"
[database.encryption]
key_file = "/etc/bakelite/secrets-only.txt"
A per-database [database.encryption] overrides the shared
[defaults.encryption]. Omitting both leaves a database unencrypted.
Runtime overrides keep the secret out of the TOML on shared hosts: set
BAKELITE_KEY_FILE (or BAKELITE_KEY / the passphrase variants) and the env value
wins over the config. See Environment variables for the
full list and how the env file is loaded.
Keep an off-host copy of the key. The key is the one thing your backups don't contain — lose it and every encrypted backup is unrecoverable, however safe the bucket is. Store a copy somewhere independent of both the database host and the backup bucket (a password manager, a separate secrets store, a sealed envelope): a key kept only on the machine being backed up dies with that machine, and one kept only in the same bucket falls to the same breach. The daemon and every CLI command (
restore,verify,compact,list,usage) need it.
Rotating the key means starting over from a fresh full backup (point a new
key_file and let the daemon take a new snapshot).
Toggling or rotating
Both directions — enabling encryption on a previously-plaintext database
and disabling encryption on a previously-encrypted one — work without
manual intervention: edit [database.encryption] (or remove it) and restart
the daemon. The two directions behave differently because they have to.
Enabling encryption on a database that already has plaintext backups needs a
short migration, because the secure default require_encrypted = true refuses to
read those legacy plaintext objects. Set the key and require_encrypted = false together: the wrapper then notices each on-backend object is plaintext (no
ciphertext header) and passes it through unchanged on read, so legacy snapshots and
segments stay readable while every new object the daemon ships is encrypted. A
replica can hold a mix during this window, and restore / verify walk both.
Then run bakelite reencrypt to rewrite the legacy objects under the key, and
finally remove the require_encrypted = false override (back to the strict
default) and restart. A fresh database (no prior plaintext) needs none of this —
just set the key and leave require_encrypted at its default.
Disabling encryption is the only case that can't resume in-place. The daemon has no key to read the existing encrypted manifest, so it warns and bootstraps a fresh plaintext full backup. The encrypted history is left untouched on the backend (clean it up with backend tools when no longer needed) and stays restorable only by temporarily flipping the config back to the old encrypted setup.
Wrong-key or genuinely corrupt objects are not treated as a mode mismatch — they keep their loud failure (the wrapper only passes bytes through that plainly aren't ciphertext), so a silent re-bootstrap can't mask a real incident.
# Enable encryption on a database that already has PLAINTEXT backups:
bakelite keygen --output /etc/bakelite/key.txt
# edit /etc/bakelite/bakelite.toml, add (note the migration override):
# [defaults.encryption]
# key_file = "/etc/bakelite/key.txt"
# require_encrypted = false # temporary: lets the daemon read legacy plaintext
sudo systemctl restart bakelite
bakelite verify --db <name> # mixed-mode replica reads cleanly
# Rewrite legacy plaintext objects under the new key (preserves PIT history):
sudo systemctl stop bakelite
sudo bakelite reencrypt --db <name>
# Then DROP the override (back to the strict default) and restart:
# remove the `require_encrypted = false` line
sudo systemctl start bakeliteRotating the key
To rotate to a different encryption key while preserving the full
point-in-time history, run bakelite reencrypt with the prior key supplied
via --old-key-file. The walker tries the configured target key first
(no-op for already-rotated objects), falls back to the prior key for the
legacy ones, and re-uploads each under the target. Daemon-down per database;
idempotent (rerun safely after a crash or interrupted run).
# Rotate from key A to key B:
bakelite keygen --output /etc/bakelite/key-b.txt
# edit bakelite.toml: key_file = "/etc/bakelite/key-b.txt"
sudo systemctl stop bakelite
sudo bakelite reencrypt --db <name> --old-key-file /etc/bakelite/key-a.txt
# Rerun if interrupted — the second pass skips already-rotated objects.
sudo systemctl start bakelite
# Once the old key isn't needed for any remaining backups, archive or destroy it.
--old-key-file is repeatable, so chains like A → B → C work even if you
skipped a rotation. Tried in declaration order, so list your most-recent prior
key first.
Decrypting back to plaintext
Symmetric: remove [encryption] from the config, then run reencrypt with
--old-key-file pointing at the current key. Every object is rewritten
in the clear, the old encrypted history is fully recoverable. The same
flow with --dry-run --json first shows the size of the work without
touching the backend.
# edit bakelite.toml: remove the [encryption] block
sudo systemctl stop bakelite
sudo bakelite reencrypt --db <name> --old-key-file /etc/bakelite/key.txt --dry-run --json
sudo bakelite reencrypt --db <name> --old-key-file /etc/bakelite/key.txt
sudo systemctl start bakelite
bakelite reencrypt reuses the same advisory lock the daemon takes, so if you
forget to stop the daemon it refuses with a clear message instead of racing
the live writer.
Passphrase mode
For lower-friction setups bakelite can derive its key from a typed passphrase
instead of a key file. Under the hood bakelite runs Argon2id once at startup
to derive the symmetric key from the passphrase, then uses the same encryption path
as key-file mode. A recovery host only needs the passphrase to restore; no key file
to ship.
Set exactly one of key_file, passphrase, or passphrase_file per
[encryption] block — they are mutually exclusive (setting more than one is a
config error). The two passphrase forms are:
# Inline — handy for tests or throw-away setups.
[defaults.encryption]
passphrase = "correct horse battery staple"
# Or read from a file (first non-blank, non-`#`-comment line wins).
[defaults.encryption]
passphrase_file = "/etc/bakelite/passphrase.txt"
Setting more than one of key_file / passphrase / passphrase_file in
the same [encryption] block is an error — the loader rejects the ambiguity
up front. Env overrides also exist for the passphrase path:
BAKELITE_PASSPHRASE=<text> or BAKELITE_PASSPHRASE_FILE=<path>.
Trade-off. A generated key file is 256 bits of pure entropy. A typed
passphrase is whatever you can remember — gated by Argon2id's cost. The math:
with a strong passphrase the brute-force cost stays infeasible; with a weak
one ("password", a dictionary word), an attacker who steals your backups can
guess it. If security really matters to you, use key_file; passphrase
mode is there for the times the ergonomics matter more than the absolute strength.
Multi-destination replication
Optional. Replicate one database to more than one backend for redundancy — local NVMe + S3, two regions, anything supported by Backends. Every write fans out to all destinations in lock-step; restore picks the first reachable.
List more than one [[database.backends]] entry (TOML's array-of-tables
syntax) and every write fans out to all of them:
[[database]]
name = "critical"
path = "/data/critical.db"
[[database.backends]]
type = "file"
path = "/srv/replicas/critical" # local mirror, instant restore
[[database.backends]]
type = "s3"
bucket = "offsite-backups"
prefix = "bakelite/critical" # cloud copy, disaster recovery
The shared top level supports the same shape: a top-level [[backends]] defines a
default fan-out set for every [[database]] that doesn't override it.
Guarantees, in plain terms
- Bit-for-bit mirrors. Every destination holds the same object bytes for every key — encryption (when configured) runs once and the ciphertext is fanned out, so adding destinations doesn't multiply CPU cost.
- Strict in-sync. A sync round either lands on every destination or fails as a whole and is retried (segment puts are idempotent — re-shipping the same index/level overwrites with the same bytes). A consistently slow destination paces the whole pipeline — the trade is a much simpler "every mirror always in step" invariant. (Add or repoint a destination after replication has begun and bakelite handles it on the next start — see Adding or moving a destination.)
- Resilient reads. Restore tries destinations in order and falls through to the next on any failure — a network blip, a missing object, a wiped mirror, or a corrupt copy (see below). Listings union across destinations and dedupe, so a wiped primary doesn't hide data the secondary still holds.
Redundancy and bit-rot recovery
Replicate to two servers and a restore survives one of them silently rotting a backup — bakelite reads the next copy automatically. That's the payoff of more than one destination, and it's on by default.
When a database has 2+ destinations, restore (and verify/list) validates
every object as it reads it — the container CRC, the AEAD authentication tag
under encryption, and the recorded content hash — and on any mismatch falls
through to a healthy copy on another destination. A single SFTP/SSH server that
bit-rots one backup object no longer fails the restore; a sibling covers for it.
Both sync backends and async mirrors are in the pool of copies to fall back to.
This is safe because every destination holds byte-identical objects, so any one valid copy is authoritative. If every copy of an object is corrupt, restore fails loudly — it never hands back a silently-wrong database.
Two ways the corruption surfaces:
- At restore time, transparently — the restore summary notes
recovered N object(s) from a healthy copyand names the rotting destination.bakelite verifyreports the same, so a scheduled verify catches a degraded destination before you ever need to restore. - Proactively, with
bakelite repair: it walks every object across all destinations and rewrites a healthy copy over any that are corrupt or missing, healing the rot in place.
Set validate_on_restore = false to opt out (read from the first destination
without validation). It defaults to true and adds no cost to a
single-destination database — the validating reader is only built when there's a
sibling to fall back to. There's no effect on the replication hot path either way.
Async mirrors (off the hot path)
A sync destination must accept every write before a sync round completes, so a
consistently slow one paces the whole pipeline. Mark a destination mode = "async"
instead and bakelite reconciles it in the background — live replication only
waits on the sync destinations (at least one is required; the first is the
primary):
[[database]]
name = "critical"
path = "/data/critical.db"
[[database.backends]]
type = "file"
path = "/srv/replicas/critical" # sync primary — instant local restore
[[database.backends]]
type = "s3"
bucket = "offsite-backups"
prefix = "bakelite/critical"
mode = "async" # background mirror — never paces the hot path
A background task copies new objects from the primary to each async mirror every
mirror_interval (default 60s; a per-database tunable). Because bakelite's data
objects are write-once and content-addressed, this is a simple list-diff-copy that
copies stored bytes verbatim — so it mirrors ciphertext under encryption without
needing the key. The mirror is eventually consistent: it advances its CURRENT
pointer only to a restore point it has fully copied, so it's always a valid, restorable
prefix of the primary, at most ~mirror_interval behind. A slow or down async
mirror is logged and retried — it never fails or slows live replication.
The mirror also propagates the primary's deletions: each reconcile pass prunes
objects the primary has retired through retention or compaction. A successful full
listing gates the prune, and it never removes the run its own CURRENT restores
from — so the mirror tracks the primary rather than growing without bound, while
staying a valid restorable prefix.
Adding or moving a destination
If you add a destination to an already-running database — or repoint an existing
one at a new, empty location — that destination starts out holding none of the
existing backup chain. Because sync destinations are written in lock-step, it
would otherwise receive only new change-sets from that point on, leaving it
without a base snapshot underneath them: not independently restorable, and only
bakelite verify would catch it.
Each time the daemon starts and resumes an existing database (not on a fresh
bootstrap, which already fans out a complete base to every destination), it
compares the snapshot/segment objects each sync destination holds against the
combined set across all of them. A destination missing any of those data objects
is incomplete, and bakelite applies on_incomplete_destination:
| Value | Behaviour |
|---|---|
backfill (default) | Copy the existing chain onto the incomplete destination from a destination that holds a complete copy — the same verbatim list-diff-copy the async mirror uses (ciphertext as-is, no key needed) — so it's immediately a full, independently restorable replica. If no destination holds a complete copy (they've genuinely diverged), bakelite refuses and asks you to resolve it by hand rather than copy from a partial peer. |
refuse | Stop the database and report which destination is incomplete, rather than ship onto a partial chain. Fail-safe: nothing is written until you resolve it. |
warn | Log a warning and keep going. The destination stays unrestorable until the next full snapshot rebases the chain. |
This only applies to sync destinations; async mirrors always catch up through
their own background reconcile. The check runs at daemon start on resume, so a
destination that goes incomplete mid-run isn't re-detected until the next daemon
restart or the next full snapshot rebases the chain. Set it under [defaults] or
per-database. To move a destination cleanly without a backfill, you can instead
remove the old one, let a fresh snapshot rebase, then add the new location.
Limitations
- The metrics endpoint and
bakelite usagereport once per database, against the primary destination's view. (Under strict in-sync the sync destinations agree; the limitation matters only after divergence.)bakelite statusadditionally shows a line per async mirror with its lag (objects/bytes behind) and last reconcile time. - The single replication cursor means a slow sync destination pulls the fast
ones down with it — mark such a destination
mode = "async"(above) to take it off the hot path. Per-destination sync cursors with independent backoff are not v1.