Operations: backup, upgrade, migrate

Looking after a running bty-web: back its state up, upgrade the software, and move it to new hardware (or a new host).

What counts as state

bty-web keeps everything in one directory, BTY_PATHS_STATE_DIR (default /var/lib/bty; the bty-data named volume in the container deploy):

Path

What

Backup?

state.db

The SQLite database: machine records, MAC->image assignments, catalog metadata, server settings, sessions, and the audit log.

Yes – this is the irreplaceable bit.

boot/

The netboot artifacts (BTY_PATHS_BOOT_DIR: kernel / initrd / squashfs).

Optional – re-fetchable via “Fetch netboot artifacts”.

catalog.toml

The active catalog manifest.

Optional – re-fetchable from the upstream.

bty-web (v0.40+) holds no image bytes. Image bytes live in withcache under ./data/withcache/, populated on first flash of each URL and backed up independently.

A minimal backup is just state.db; a full backup is the whole /var/lib/bty tree.

Data separation and read-only-OS readiness

bty-web is built so that all mutable runtime state lives under BTY_PATHS_STATE_DIR (/var/lib/bty) – a single writable volume. In the container deploy that volume is bty-data: the container image is immutable and recovery is “pull a new image, re-attach the volume.” The rest of this section is the readiness checklist for that split.

bty-web’s runtime writes already all land under /var/lib/bty, split into two classes:

Path

Class

Notes

state.db

precious

records: machines, catalog, settings, audit log

boot/

ephemeral

netboot artifacts – version-coupled; refetch on a bty version bump

session-secret

regenerable

cookie key

Precious = carry across a migration / back up. state.db carries the machine bindings + audit log + settings; v0.33.0+ auto-rotates it on a version mismatch. The bty-web export bundle (v3, metadata-only) carries the per-machine hardware identity (mac + hw_lshw + known_disks) so a re-imported machine shows up pre-fingerprinted; bindings reset and the operator re-binds.

Image bytes live in withcache (separate process, separate data dir, backed up independently). v0.40+ took bty-web out of the bytes plane; the live env streams from withcache or from the catalog URL’s origin, never from bty-web’s filesystem.

Ephemeral = safe to lose, re-created on demand. boot/ is the subtle one: it lives on the writable volume (so a read-only OS is possible) but is re-fetched when it no longer matches the running bty-web version, rather than preserved as precious.

The container deploy already realises this split: the bty-web container image is immutable, and /var/lib/bty is the bty-data named volume that carries everything precious. $BTY_ADMIN_PASSWORD is supplied via the container env rather than written into the image. Pulling a new image and re-attaching the volume is the whole upgrade.

Backup

state.db is a single SQLite file. The safe way to copy a live database is SQLite’s online backup (consistent even while bty-web is running):

sqlite3 /var/lib/bty/state.db ".backup '/tmp/bty-state-$(date +%F).db'"

A plain cp also works if bty-web is stopped first:

sudo systemctl stop bty-web
cp -a /var/lib/bty/state.db ~/bty-state-backup.db
sudo systemctl start bty-web

For a full backup of everything bty-web manages (records + netboot artifacts + any on-disk backup bundles), copy the whole directory while bty-web is stopped:

sudo systemctl stop bty-web
sudo tar -C /var/lib -czf ~/bty-state-$(date +%F).tar.gz bty
sudo systemctl start bty-web

This does NOT include cached image bytes – since v0.40, bty-web is out of the image-bytes plane; cached blobs live in the separate withcache data dir (its own container volume). Back that up independently if you need the cache to survive (otherwise withcache re-fills on demand from the upstream catalog).

Restore by putting the file(s) back under /var/lib/bty (bty-web stopped) and starting the service.

Scheduled backups (UI-driven, since v0.25.7)

The /ui/backups page carries a Back up now trigger plus a Schedule card on /ui/settings#backup-schedule for cadence (daily / weekly / manual) + retention (keep N most recent). The scheduler ticks every 60s; a change in Settings takes effect on the next tick without restarting bty-web.

Each backup is a directory written under $BTY_PATHS_BACKUP_DIR (default $BTY_PATHS_STATE_DIR/backups) named after the ISO-8601 timestamp, e.g. 2026-05-24T08-00-00Z/. The bundle layout is identical to what bty-web export produces (a single inventory.json carrying per-machine mac + hw_lshw + known_disks), so a scheduled backup is interchangeable with a manual one. Image bytes are NOT included – bty-web doesn’t have any (v0.40+); withcache holds the cached blobs independently. Retention prunes the oldest siblings after every successful run.

Two env vars tune the feature when the in-UI knobs aren’t enough:

Variable

Default

Meaning

BTY_PATHS_BACKUP_DIR

$BTY_PATHS_STATE_DIR/backups

Where backup directories land. Move off the OS disk if you want them to survive an OS reflash.

BTY_TUNING_BACKUP_MAX_PARALLEL

1

Max concurrent backup jobs. Concurrent exports race on dest dirs; leave at 1 unless you have a reason.

History lands in the audit log under subject_kind=backup (kinds backup.created / backup.failed / backup.pruned); the /ui/backups page also surfaces the recent rows in a card at the bottom.

Portable export / import (operator data only)

tar-copying the whole tree (above) is the verbatim option. The bty-web export / import subcommands are the slim one: they move only the expensive-to-recollect half – per-machine hardware identity (mac + lshw + known_disks from the box’s last live-env boot) – and nothing else. The catalog, machine bindings, audit log, settings, and netboot artifacts are deliberately left behind so an upgrade lands on a fresh, regenerable state and the operator re-binds. Reach for this to migrate hardware fingerprints across an upgrade or to a new host without dragging the rest along.

# On the old server (reads BTY_PATHS_STATE_DIR):
bty-web export /tmp/bty-bundle

# Copy /tmp/bty-bundle to the new server, then:
bty-web import /tmp/bty-bundle

The in-UI Back up now trigger on /ui/backups produces the same bundle shape; reach for the CLI when scripting (cron / a podman exec into the container / packaging into an archive pipeline) and the UI when you want an ad-hoc snapshot without leaving the browser.

What a bundle carries, and what it deliberately leaves behind:

Travels

Stays behind (fresh on the destination)

Machine mac

Boot mode (every machine imports as bty-inventory)

lshw hardware tree (CPU / RAM / NICs)

Image binding + target_disk_serial + sanboot_drive + labels

known_disks (lsblk inventory + serials)

The saw_flasher_boot state bit + last_flashed_at

The image catalog (catalog_entries) - re-import on the new host

The netboot artifacts (re-fetch to match the new version)

Server settings + the audit log

Resetting the boot mode is the point: a freshly-migrated machine shouldn’t auto-flash against netboot artifacts you haven’t refreshed yet. Each box arrives as a re-discovered bty-inventory box with its hardware + binding pre-filled; you re-enable a flash mode once the new server is verified and its netboot artifacts re-fetched.

A bundle is a plain directory (a single inventory.json), so tar it – or just cp – for archival.

Upgrade

bty pre-1.0 has no database migration framework. The DB carries the exact bty.__version__ that created it in a bty_version table. When the running release doesn’t match, bty-web automatically rotates the old state.db to state.db.<from>.<ts>.bak and creates a fresh one in its place. Every release is therefore breaking for state, by design – but the operator does nothing.

Auto-rotate on schema mismatch (v0.33.0+)

On bty-web startup, if the stored bty_version disagrees with the running release (or the DB is pre-versioning – data tables present without the marker), init_db does:

  1. Renames state.db to state.db.<from-version>.<UTC-iso>.bak (e.g. state.db.0.27.4.20260525T101530Z.bak). The old DB is preserved on disk for forensics.

  2. Unlinks the WAL sidecars (state.db-journal / -wal / -shm) so the fresh DB doesn’t pick up stale pages.

  3. Creates a fresh state.db with the running release’s schema, stamped with bty.__version__.

  4. Records a system.schema.reset event with details {from_version, to_version, archived_at}. The event surfaces as an unacknowledged tripwire on /ui/dashboard; acknowledge it from /ui/events.

Operator-irreplaceable state lives outside state.db:

  • Netboot artifacts under BTY_PATHS_BOOT_DIR – not touched.

  • Backup bundles under ${BTY_PATHS_STATE_DIR}/backups/ – not touched.

  • Withcache blobs under the separate withcache data dir – not touched (different process).

What rotation discards: machine bindings, the audit log, operator-overridden settings, the catalog cache index. Bindings re-discover on the next PXE contact from each machine.

Preserve hardware inventory across an upgrade

If you want MAC + lshw + known_disks to survive the rotation, export before upgrading and import after:

# Before upgrade: snapshot to a portable bundle.
sudo bty-web export /var/lib/bty/backups/pre-$(date +%Y%m%d)

# Upgrade bty-web (pip / pipx / container image pull), then:
sudo bty-web import /var/lib/bty/backups/pre-$(date +%Y%m%d)

The slim bundle carries a minimal per-machine record (mac + hw_lshw + known_disks) and nothing else: bindings (boot_mode, bty_image_ref, target_disk_serial, sanboot_drive, labels) reset on import and the operator re-binds; the image catalog (catalog_entries) does not travel either – re-import the catalog on the new appliance via the Settings page’s “Fetch latest catalog” button (or upload a catalog.toml directly). See “Backup”.

Recovering an old .bak

The rotated DB is a normal sqlite file. Read it with the sqlite3 CLI to recover specific rows:

sqlite3 /var/lib/bty/state.db.0.27.4.20260525T101530Z.bak \
    "SELECT mac, bty_image_ref, boot_mode FROM machines"

Once you no longer need it, rm it like any other file.

Upgrade in place (pip / pipx install)

If you installed bty-lab directly:

pipx upgrade bty-lab            # or: pip install -U bty-lab
sudo systemctl restart bty-web

Re-fetch the netboot artifacts after upgrading. The live-env artifacts in BTY_PATHS_BOOT_DIR (kernel / initrd / squashfs) are versioned and fetched separately from bty-web – the package upgrade does NOT touch them. So a freshly-upgraded server keeps serving the previous live env until you refresh it: open /ui/netboot and click Fetch latest artifacts (or pin a tag under Settings -> Upstream sources first). Skip this and PXE clients boot the old live env against the new server – a confusing version split.

Upgrade the container deploy

In the container deploy the upgrade is a single bty-lab upgrade call. It regenerates compose against the CLI’s bty version (image-tag pin moves forward), preserves envvars + data/, pulls new images, and restarts the stack – auto-detecting whether to drive that via podman compose up -d (plain) or systemctl restart (Quadlet-managed):

uvx bty-lab upgrade /opt/bty        # the dir you bootstrapped with `init` / `deploy`

For step-by-step control, run the pieces manually (re-emit, then pull + restart):

cd /opt/bty
uvx bty-lab init --force .          # regenerates compose.yml against newer bty
podman compose --env-file envvars --profile tftp pull
podman compose --env-file envvars --profile tftp up -d

AutoUpdate=registry plus podman-auto-update.timer automate the pull step for the Quadlet variant (init --systemd). After the pull, re-fetch the netboot artifacts (open /ui/netboot -> Fetch latest artifacts) so PXE clients boot a live env matching the new bty-web version. See deploy/README.md.

Migrate to a new host

Stop bty-web on the old host, copy the deploy directory’s data/ tree (or /var/lib/bty for a host install) to the new host, and start bty-web there. The MAC->image assignments and audit log come with it; only the host’s own IP changes. Re-point your LAN DHCP at the new host’s IP and re-fetch the netboot artifacts on the new instance.

Recovering from a failed or interrupted flash

A flash writes directly to the target disk; bty has no rollback. If a flash fails partway - network drop, integrity mismatch, operator Ctrl+C, a wedged disk - assume the target holds partial, unbootable data and re-flash it from a trusted source. There is nothing to clean up first: the next flash overwrites from byte 0.

  • Integrity mismatch (FlashIntegrityError): the streamed bytes did not match the source’s digest. The disk was already written (a stream can’t be checked before it’s written), so it is suspect. Re-flash from a source you trust; if it recurs with the same source, the upstream artifact or its published digest is wrong.

  • Interrupted download / cancel: re-run the flash. For a server-driven PXE box, just let it boot again - the plan re-flashes.

  • Stuck on the live env after a crash mid-flash: if a machine fetched its boot artifacts but never POSTed /pxe/{mac}/done, its saw_flasher_boot bit stays set and it keeps booting the flasher. Re-save the machine record in /ui/machines (or fix boot_mode) to clear the state.