Changelog¶
The release-by-release operator-facing summary lives in CHANGELOG.md
at the repo root. Embedded here so the rendered docs site
carries the same content; the source of truth is the root file.
This file follows Keep a Changelog.
The format reflects what actually matters to an operator running bty
(the bty-lab PyPI package + bty-web container) – behaviour the
operator perceives, defaults that survived a pip install -U, and
gates that landed in CI.
Per-release commit history lives in git log; this file captures the
operator-facing summary.
[0.60.0] - 2026-06-25¶
Removed¶
/images/{key}[/{name}]route + the_stream_remote_imageoras stream-proxy. The route’s historical role was “let bty-web do the OCI manifest dance for the live env on cold withcache”. Both backstops are gone:withcache 0.6.0 (v0.59.0’s hard dep) is oras-aware on the cache-host side, so a withcache deploy absorbs the OCI work transparently;
The live env’s bty TUI handles
oras://itself viawithcache.oras(resolve + bearer + curl) when the plan endpoint hands it the raw URL.
pxe_plantherefore ships the originalsrc(oras:// or https://) on cold-cache or no-withcache paths now; no more bty-web in the bytes path for oras. Operators on the default withcache deploy see no change. Operators running bty-web without withcache who had oras catalog entries: the live env now needs egress to the OCI registry directly (previously the bty-web host did the registry talk, the live env stayed LAN-only). Configure withcache (the default deploy stack) to keep the LAN-only property.
Changed¶
cache_decision.served_fromno longer carries"bty-web-proxy"as a value (the proxy is gone). The two surviving values are"withcache"and"origin".
[0.59.0] - 2026-06-25¶
Changed¶
OCI handling moved upstream to
withcache.oras.bty.oraswas deleted. The OCI registry adapter (parse_ref, fetch_anonymous_token, fetch_manifest, pick_image_layer, resolve_ref, sidecar + mediaType filters) now lives in withcache 0.6.0+ and bty imports it as a library. Same surface, same semantics, just one implementation across the bty ecosystem.bty-labnow declareswithcache>=0.6.0as a hard runtime dependency. withcache is stdlib-only so the install graph stays light, and the defaultbty-lab initdeploy stack has shipped withcache as a sidecar since v0.40, so most operators won’t notice. The mypy override for the un-typed import lives inpyproject.tomluntil withcache ships apy.typedmarker./catalog.tomlrewrites every remote src through withcache when configured. Previously cached entries went through<bty-web>/images/<sha>/<name>and uncached entries shipped the raw upstream URL (oras + https alike). With withcache 0.6.0+ oras-aware, the rewrite is uniform: every remote entry becomes<withcache>/b/<b64(origin)>/<basename>regardless of original scheme. The live env consuming the catalog sees only plain HTTPS URLs on the LAN cache; withcache absorbs the ghcr.io stream-cut class internally via its Range-resume loop. Falls back to the pre-existing direct-origin behaviour whenBTY_WITHCACHE_URLis not configured.Withcache HEAD probes drop bty-side bearer minting.
/pxe/{mac}/planand/ui/catalog/entries/checkused to pre-resolve an OCI bearer and pass it on the HEAD’sAuthorizationheader so withcache 0.4.0’s background fetch worker could forward it. Withcache 0.6.0+ mints its own bearer when the cache key isoras://..., so both endpoints now HEAD against the originalsrc(oras or https) with no bearer plumbing. Thehead_headers+fetch_anonymous_tokenpaths in_app.pyand_ui.pywere removed.
Fixed¶
HTTP/2 stream-resets on multi-GiB
oras://ghcr.io/...flashes now have two backstops: the surgical--http1.1from v0.58.3 (already shipped) on the bty client side, and withcache’s Range-resume on the cache-host side once the catalog routes through it (this release). Operators on the default withcache deploy get the resume path transparently.
[0.58.3] - 2026-06-25¶
Fixed¶
oras://flash failed withcurl exited 92after roughly the same N minutes on every retry. On bty-usbboot, largeoras://ghcr.io/...images aborted mid-stream with aCURLE_HTTP2_STREAMframing-layer reset. GHCR’s blob CDN (pkg-containers.githubusercontent.com) RST_STREAMs long-running HTTP/2 blob transfers once the pre-signed redirect URL’s TTL expires, and our pipeline streams directly intoddwith--retrydeliberately disabled (retrying from byte 0 would corrupt the target), so the stream death was terminal. Forcing HTTP/1.1 on every streaming fetch sidesteps the framing-layer reset; HTTP/2 multiplexing buys nothing for a single large stream-to-dd transfer, so the cost is zero. Local-pre-download isn’t an option in the live env (no writable disk, RAM too small for multi-GiB blobs), so HTTP/1.1 is the right trade-off for now. (src/bty/flash.py:_CURL_BASE.)
[0.58.2] - 2026-06-23¶
Changed¶
Docs: aligned components / flows / reference with current routes and event kinds. Several pages had drifted post-v0.57:
flows.md’s audit-log table listed a TFTP-control kind pair that was removed (netboot.tftp.controlled/.control_failed), two “kinds” that are actuallynetboot.pxe.offeredrows withdetails.reasonflags (orphan_ref/no_target_disk), and used the old underscore form on.failedkinds; new.requested/.started/.cancelledlifecycle kinds + thesystem.schema.resettripwire were missing. Operator-UI table pointed at the removed/ui/settings/tftp-controland was missing the new/ui/settings/{upstream,backup,config/edit}rows.components.md’s “Failure symmetry” listed underscore-form kinds and event kinds that the code never emits (image.upload*,image.hash*,settings.tftp.control_failed); “Filtering” described the retired multi-dropdown form instead of the?q=substring search.reference.mddocumented two routes that don’t exist (POST /ui/settings/flash,POST /ui/settings/tftp-control) and described the old single Upstream-sources card layout.
Operations docs: corrected the bty-web export description. The doc called the export tool “selective” with “image bindings, catalog, and local image files”. The actual export shape per
_portability.pyis justmac+lshw+known_disks; the catalog and bindings reset on import. The “full tar backup” example was also claiming it captured cached image bytes – bty-web exited the bytes plane in v0.40, so cached blobs live in withcache’s own data dir now. Both rewritten.Images table: cleaner explanation of the “unset” sha badge. The title-attr tooltip was promising “an opt-in v0.41 verifier will hash the stream and compare against this column when set” – the verifier shipped in v0.41 (
bty.flash._spawn_hash_tee); the tooltip now describes what actually happens (the live envcurl | tee | sha256sum | ddrefuses on a mismatch when this column is set, skips the check when unset).Stale env-var alias removed.
_app.py+ the Backup-schedule card comment claimedBTY_BACKUP_DIRwas a “legacy” alias forBTY_PATHS_BACKUP_DIR; the legacy var is not read anywhere. An operator who tried it would have seen no effect. Mention dropped so the documented surface matches the consumed surface.Audit-log docstrings + comments dropped references to event kinds the code never emits (
image.upload.failed,image.hash.failed,settings.tftp.controlled). Onlyimage.upstream.truncatedis a real image-namespaced kind; uploads + hashing left the bty-web scope when v0.40 took it out of the bytes plane, and the TFTP-control route was removed.
[0.58.1] - 2026-06-23¶
Changed¶
Docs: corrected the backup-bundle “Travels” table. The
operations.mdtable was overstating what a v3 bundle carries (claiming image binding /target_disk_serial/labels/catalog_entriestravel) – the actual export shape per_portability.pyis justmac+lshw+known_disks. Operators reading the doc would have expected their bindings + catalog to survive an export/import; the code has always reset all of that on import. Table rewritten to match the export shape.Search-input placeholder spells out what matches. The
/ui/machinesFilter input now readsMAC / label / image / IP (any field, substring)so it’s clear that typing one term hits any field of any row, including any of the machine’s labels.
[0.58.0] - 2026-06-22¶
Changed¶
Machine record:
hostnamereplaced by plurallabels. The field has always been described as “cosmetic; not consumed by the boot chain”, but the column was a singularhostnamewith an RFC-1123 validator. In practice operators want to tag a box with several non-overlapping things at once (rack location, hardware vendor, ad-hoc notes), and the DNS regex rejected reasonable inputs (underscores, spaces, mixed case). Reshaped end-to-end: themachines.hostnamecolumn is gone, replaced by amachine_labelsside table; the JSON API takeslabels: list[str]; the UI form is a comma-separated input; the row renders chip badges; each chip is a/ui/machines?q=<tag>link. Per-label shape is alnum-leading + alnum/space/-/_/., max 64 chars; cap of 16 labels per machine. Breaking, pre-1.0:state.dbauto-rotates on upgrade (the existing schema-mismatch path) so machine bindings reset; export withbty-web exportbefore upgrading if you want to preserve the hardware-inventory side.Machines table: new “Last IP” column + column reordering. The last-seen source IP was already captured (
last_seen_ip, populated on everyGET /pxe/{mac}) but only shown on the detail page. It now has its own sortable column, and the overall order is identity (MAC, Labels) -> configuration (Image, Boot) -> timeline (Last seen, Last IP, Last flashed) so the eye scans left-to-right in the same “which box / what is it set up to do / what has it done” order operators ask. Each IP cell deep-links to/ui/events?q=<ip>so the “this MAC is on .42 – what else has .42 done?” pivot is one click.
[0.57.1] - 2026-06-19¶
Changed¶
Subnav strip vertical rhythm + text uniformity. v0.57.0 bumped the strip padding to give the new page-level forms breathing room, but the result read as too generous against the rest of the chrome. Cut top / bottom padding to 0.25rem, dropped the inner container’s
min-heightto 2rem, and set every text-bearing descendant (form-control-sm/btn-sm/a/label/.small/small) tofont-size: inheritso the strip’s single 0.85rem applies to anchors, labels, inputs, and buttons alike. Before the fix three slightly-different sizes (subnav-jumps anchors at 0.82rem, Bootstrap form controls at their own 0.875rem,.smallFilter labels at 0.875em of the parent) lived in the same row.
[0.57.0] - 2026-06-19¶
Changed¶
Settings page: the upstream sources panel is now two visually distinct cards. The single mixed “Upstream sources” form was hard to scan because netboot artifacts and the image catalog share nothing operationally. They now live in
#netboot-release(netboot repo + tag) and#catalog-source(catalog URL); the Settings sub-nav has matching jump links. Both cards still submit through one POST handler so saving from either side persists all three fields in one round-trip.Catalog setting collapses to a single URL field. The
catalog_repo+catalog_tagpair (and the URL-composition logic on top of them) is gone; the operator now sets onecatalog_urlvalue, defaulting tohttps://github.com/safl/nosi/releases/latest/download/catalog.toml. TheFetch latest catalogbutton GETs whatever string the Settings page holds: pointing at a fork or a private catalog server is a single-field edit, not a repo-and-tag puzzle. Pre-1.0 break-freely: any existingupstream.catalog_repoandupstream.catalog_tagrows instate.dbare now dead keys (harmless, ignored on read).
[0.56.0] - 2026-06-19¶
Added¶
Paginated + sortable tables on /ui/machines and /ui/images, plus free-text search across both. Operators running a real lab fleet end up with hundreds of
machinesrows and a catalog of dozens of images; the old “render everything in one scroll” approach was becoming painful. Now:Every column header is a sort toggle (click once to sort, click again to flip direction). The active column shows an arrow; the others show a faded up-down glyph to advertise that they’re clickable.
A per-page selector (25 / 50 / 100) plus a Prev / 1..N / Next / Last pagination strip lives under each table. The footer also shows
Showing 26-50 of 178 machines.A free-text search box (
?q=) above each table narrows by substring across the columns the operator usually pivots on: MAC / hostname / image-ref / last-seen IP for machines, name / format / arch / source / sha256 for images. Typingfreebsdshows only the freebsd images; typing the last hex of a MAC finds that one machine.All state (sort, direction, page, per_page, q, filter) lives in the URL, so the view is bookmarkable and survives a refresh.
The
?sort=column is allowlisted per-page; an out-of-list value silently falls back to the default rather than being interpolated into SQL.SSE auto-refresh on /ui/machines stays on only for the default view (no filter / no search / default sort / page 1); operators drilling into a slice of the fleet now get a stable static view instead of having the SSE push wipe their sort.
The existing categorical
?filter=(discovered / assigned) keeps working and composes with the new search + sort + pagination.
[0.55.12] - 2026-06-19¶
Fixed¶
Auto-flash milestone markers no longer stack the Rich progress bar – the residual case from v0.55.11. The v0.55.11 fix removed the visible stderr text leak by routing milestones through
/dev/kmsgonly, but/dev/kmsgwrites go throughprintkand fan out to every registered console – including/dev/tty0, where the kernel-timestamped line landed on the framebuffer between Rich repaint cycles. The text itself was covered by Rich’s next paint within ~100 ms (so invisible to the operator’s eye), but the framebuffer cursor had already advanced one line and Rich’s internal cursor tracker was now out of sync with the screen. Each milestone shifted the next bar render one line down: three milestones, three stacked pairs. The new path writes milestones directly to/dev/console, which resolves to the LASTconsole=cmdline target (ttyS0on every bty cmdline – USB, PXE, chain test). That hits the serial UART only: SoL / IPMI observers and the chain test still capture the heartbeat through the same UART they watch for everything else, and/dev/tty0is never touched. The local HDMI screen finally stays a single clean bar pair through 25/50/75/100%.
[0.55.11] - 2026-06-18¶
Fixed¶
Auto-flash milestone markers no longer scramble the on-screen Rich progress bar.
bty-on-tty1.serviceroutesStandardError=ttyto/dev/tty1, so the v0.55.5 milestone emitter’sprint(..., file=sys.stderr)injected rawbty: download NN%lines into the same TTY Rich was painting on; each line shifted the cursor and stacked a duplicatewriting/downloadingbar pair below. The milestone path now writes only to/dev/kmsg, which still fans out viaprintkto every registered serial console (SoL / IPMI observers keep their 25/50/75/100% heartbeats, the local HDMI operator sees a single clean progress bar). Lifecycle bookends (bty: entered,bty: exiting,bty: auto-flash starting,bty: flash complete; rebooting) still write to all three sinks: they fire outside the Rich Live context and SHOULD appear on the local tty.
[0.55.10] - 2026-06-18¶
Changed¶
Internal housekeeping: dropped redundant empty-string arg from three
print("", file=sys.stderr)calls insrc/bty/deploy.py(FURB105 auto-fix). Same behaviour. Only finding from a fresh tech-debt sweep across all code added since v0.55.3; the rest of the codebase came back clean.
[0.55.9] - 2026-06-18¶
Fixed¶
Netboot wait for the missing BTY_IMAGES device is now zero, not 20 seconds. 0.55.8 capped
bty-usb-grow.service’sJobTimeoutSecat 20 s; thevar-lib-bty-images.mountunit’s own auto-generatedWants=on the same device still triggered a 20 s wait. A tiny systemd generator at/usr/lib/systemd/system-generators/bty-skip-usb-only-units-on-netbootnow masks both units whenfetch=is on the kernel command line (the reliable netboot discriminator). With nothing Wanting= the device, systemd never activates it and theExpecting device dev-disk-by-label-BTY_IMAGES.deviceline is gone from the boot log. Empirically measured on a local QEMU PXE boot:bty: enteredmoved from ~33 s (0.55.8) to ~13 s (this fix), an 87 s improvement vs 0.55.7. The generator self-traces via/dev/kmsgso an operator can verify indmesgwhether it ran and what verdict it reached. USB-boot path unchanged (nofetch=cmdline -> generator no-ops -> units run normally)..gitignoreanchoredlib/andlib64/to the repo root. Unanchored, the Python setup.py boilerplate rules also matchedbty-media/.../includes.chroot/usr/lib/and silently dropped the new systemd generator fromgit add.
[0.55.8] - 2026-06-17¶
Added¶
bty-traceboot-phase markers for empirical timing on slow boots. A tiny helper at/usr/local/sbin/bty-tracewrites one line via/dev/kmsgper invocation (same fanout as the existing_emit_console_marker), so the line shows up on every registered console – SoL on ttyS0/ttyS1, framebuffer on tty0 – carrying the kernel’s[ T ]boot-time prefix for free. Wired asExecStartPre/ExecStartPostonbty-usb-grow.service,bty-clock-from-http.service,bty-images-discover.service, andbty-on-tty1.service. Subtracting consecutive[ T ]timestamps in dmesg / SoL now tells the operator which unit actually ate a slow boot’s wall-clock budget – triangulating between the existingbty-banner-{early,mid,late}phase boundaries (sysinit / network-online / multi-user). Visible on every PXE / USB boot because bty’s cmdline deliberately omitsquiet.
Fixed¶
PXE-only boots no longer waste 67 seconds waiting for a USB stick that doesn’t exist. Empirically confirmed via the new
bty-tracemarkers on a local QEMU PXE boot: an 86.8 s gap betweenbty-clock-from-http readyandbty-images-discover starting– which is systemd’s default 90 sDefaultDeviceTimeoutSecticking down ondev-disk-by-label-BTY_IMAGES.device. On a PXE-only boot the BTY_IMAGES label never appears; the .device unit sits at “expected” until the global timeout, holding up everything ordered after it.bty-usb-grow.servicenow setsJobTimeoutSec=20sso the job aborts at 20s when the device never materialises; downstream units (which depend on bty-usb-grow’s job completion, not its success) proceed immediately. Predicted in memoryproject_ventoy_90s_boot_delay.mdbut never landed.
[0.55.7] - 2026-06-17¶
Fixed¶
Upstream truncation during oras:// image streaming now fails loudly server-side. Field failure on a Supermicro H12SSL-I flashing
nosi-fedora-44-desktop: ghcr.io closed the TCP connection 1,504 MiB into a multi-GB oras blob, bty-web’s_chunks()treated the empty read as clean EOF, the StreamingResponse finished tidily, and the live env’s curl detected the Content-Length mismatch only client-side as exit 18. The bty-web journal had no record of which blob truncated.stream_srcnow tracks emitted bytes and raisesCatalogErrorwhen the upstream stops beforeContent-Lengthis reached; the call site in_stream_remote_imagecatches it, records animage.upstream.truncatedevent against the offending src URL, and propagates so the live env still detects the failure via curl exit 18 (so a partial image never reachesdd). Same shape as safl/withcache#7’sTruncatedDownloadguard, but for the bty-web -> oras-registry hop which bypasses withcache entirely. Mitigation: the operator should pre-cache via/ui/images“Download” before flashing upstream-unstable images so the LAN-only path is taken.
[0.55.6] - 2026-06-17¶
Added¶
Lifecycle bookends on every bty run. The 0.55.5 milestone markers only fire on the auto-flash path; an operator following IPMI SoL through an interactive wizard, a USB-local run, or a hand-driven
bty --catalog Xhad no marker to follow.bty.tui:mainnow emitsbty: entered v<X>right before the wizard launches andbty: exiting v<X>in afinallyso it fires for every exit path (clean run,sys.exitdeep in the wizard, KeyboardInterrupt, post-flash reboot, unhandled exception). Same/dev/kmsg+/dev/consolefanout as the rest of the markers, so the pair lands on every registered console regardless of mode.
[0.55.5] - 2026-06-17¶
Added¶
Auto-flash now emits 25 / 50 / 75 / 100% milestone markers. Operators watching a Supermicro / Aspeed-BMC PXE flash over IPMI SoL previously only saw the
auto-flash startingandflash complete; rebootingbookends, with nothing between them while the multi-gigabyte download + write was running on/dev/tty1(the framebuffer VT, invisible to a serial console). bty now firesbty: download NN%andbty: write NN%lines via the same/dev/kmsg+/dev/consolefanout the bookends use, once per 25 / 50 / 75 / 100 crossing. At most 8 extra kmsg writes per flash. Skipped silently when the total is unknown (some write paths can’t pre-compute the decompressed size). Identical behavior for interactive flashes on tty1: the markers also land injournalctl -u bty-on-tty1so a remote operator tailing the journal gets the same heartbeat.
[0.55.4] - 2026-06-17¶
Fixed¶
Supermicro / Aspeed BMC USB Virtual NIC no longer hijacks live-boot’s network. Boards with an Aspeed BMC (Supermicro H12SSL-I, X11/X12 generations, several others) expose a “USB Virtual NIC” gadget that presents over RNDIS. The kernel claimed it via
rndis_host, udev renamed it toenxbe..., live-boot saw two connected ethernets, picked the alphabetically-first one (the BMC virtual NIC, sorts beforeeno*), tried to DHCP+fetch the squashfs over the BMC’s internal USB net, timed out, and panicked init withUnable to find a live file system on the network.modprobe.blacklist=rndis_hostnow sits next to the existingnouveaublacklist on every live-env cmdline (both iPXE templates and both live-build BOOTAPPEND lines), so the virtual NIC never registers as a network device and live-boot only sees the real wire. Real USB-Ethernet dongles (Realtek r8152, ASIX AX88179) are unaffected; they use chip-specific drivers, not the genericrndis_host.
[0.55.3] - 2026-06-17¶
Fixed¶
Respawn cap on
bty-on-tty1.serviceactually applies again.StartLimitIntervalSec=60+StartLimitBurst=5were in the[Service]section, where systemd 257 silently ignores them (visible in the CI log capture asUnknown key 'StartLimitIntervalSec' in section [Service], ignoring). Moved to[Unit]so a wedge in the first bty frame no longer respawns the splash forever and bury the actual error.
Changed¶
Internal housekeeping: dropped 19 redundant local imports that shadowed module-level imports in the test suite, switched
_suppress_oserror.__enter__/__exit__annotations toSelf+TracebackType | Noneper the textbook context-manager pattern, and addressed three ruff findings (D413 docstring spacing, PLR1730max()overif, PT022 fixturereturnvsyield). No behavior change in production code; the full pytest suite stays green.
[0.55.2] - 2026-06-16¶
Fixed¶
PXE kernel console now reaches Supermicro BMC SoL. The PXE templates only emitted to
ttyS0, which matches Dell / iLO (they bridge COM1 to SoL) but not boards that wire SoL to the BMC’s dedicated UART (Linux sees it asttyS1). On those boards the BMC web UI sometimes exposes only “COM1 or SOL” with no usable COM1 wiring, so the operator can’t redirect from the BMC side either; the cmdline is the only place to fix it.ipxe_flash.j2andipxe_tui.j2now emitconsole=ttyS1,115200 console=ttyS0,115200: the kernel writes to both UARTs regardless of order, so whichever the BMC bridges sees the boot stream, while/dev/console(which follows the last-listed console) stays on ttyS0 so the cijoe chain test’s marker scan still works. Also addsearlyprintk=ttyS0,115200so the kernel’s own console-registration messages reach the captured serial log before the printk console-list is reshuffled, leaving a diagnostic trail for any future ordering regression.
Changed¶
PXE chain test now binds QEMU COM2 explicitly to a null backend. The i440FX SuperIO emulation has two UARTs unconditionally, but a single
-serialdirective only binds the first. The second was previously left without a chardev, which made the kernel-side 8250 probe of ttyS1 behave inconsistently when the PXE templates listed bothconsole=ttyS0andconsole=ttyS1. An explicit-serial nullfor COM2 gives ttyS1 a defined “writes go to /dev/null” backend, so the chain test is deterministic regardless of how the cmdline orders the consoles.
[0.55.1] - 2026-06-16¶
Fixed¶
Netboot page no longer renders a permanently-grey “Local dnsmasq.service” row on container deploys. bty-web running in a container can’t see the host’s
dnsmasq.service(different mount namespace) and the bty-tftp sidecar’s dnsmasq lives in yet another container, so the local-unit probe was alwaysunknownthere. The network probe above the row is the canonical signal in that mode, so the subsection is now hidden when bty-web detects it is running inside a container. Bare-metal installs see the same row as before.
Changed¶
Docs landing-page title aligned with the README. The Sphinx index page carried a longer, descriptive title; the README and
pyproject.tomldescription used the shorter elevator pitch. All three surfaces now read the same one-liner so the project’s identity is consistent wherever a new operator lands first.
[0.55.0] - 2026-06-15¶
Fixed¶
Flash write progress bar no longer freezes at 100%. The write total is fundamentally unknowable in advance (gzip wraps its uncompressed-size trailer mod 2^32, qcow2 virtual_size need not equal the bytes dd ends up writing, the compressed-size fallback is smaller than the decompressed write count). Init the write task with
total=Noneso Rich’sBarColumndraws a pulsing scanner (a KITT / Knight Rider look) and add aTransferSpeedColumnso the operator still sees the running byte count, live bandwidth, and elapsed time. The download bar stays determinate becauseContent-Lengthis reliable. Hardware-spotted on v0.54.0 against a debian-13-headless.img.gz served via piKVM virt-mount.“Back up now” button on
/ui/backupsnow visibly does something. A v3 metadata-only backup completes in milliseconds, faster than the active-jobs poll cadence, so the page never registered the active-to-inactive transition that drives the auto-reload; the server-rendered “Backups on disk” card did not pick up the new bundle. The click handler now reloads the page after a successful enqueue so the new bundle and refreshed cadence indicators surface immediately.
[0.54.0] - 2026-06-15¶
Added¶
Two progress bars during a URL flash: download and write run in parallel as a streaming pipeline (curl, optional decompressor, dd to target). Before this release a single bar tracked the writer side only, so a slow upstream and a slow target disk looked identical to the operator and compressed images underreported network throughput by the compression ratio. A small dd is now interposed between curl and the rest of the pipeline; its progress feeds a new
downloading_progressevent family alongside the existingwriting_progress. The TUI lazily adds a second Rich progress task on the first download tick, so local-file flashes keep the single-bar layout while URL flashes get a download bar above the write bar.
Fixed¶
GitHub Pages docs deploy survives a job re-run. A re-attempt of the
Deploy to GitHub Pagesjob inside the same workflow run left the priorgithub-pagesartifact in place; the nextupload-pages-artifactadded a second copy anddeploy-pagesaborted with “Multiple artifacts named github-pages were unexpectedly found for this workflow run”. The deploy job now deletes any pre-existinggithub-pagesartifact for the current run before uploading, so a re-run is idempotent.
[0.53.0] - 2026-06-15¶
Added¶
Architecture column on the catalog / image listings. The bty TUI’s image table and bty-web’s
/ui/imagespage now show a best-effort arch hint (x86_64,arm64,i386,arm,riscv64, etc.) for each catalog entry. Operator-facing display only – bty never restricts or filters flash eligibility on it; the column just makes it visible at a glance which images target which platform. Sources, in order: explicitarch = "..."field in catalog manifests (publishers like nosi should populate it for accuracy), then a filename heuristic fallback that recognises the common token spellings.?/-when nothing resolves.
Changed (BREAKING)¶
Variant naming streamlined along two axes: boot source (
netbootover PXE,usbbootfrom a stick) and hardware family (pcfor generic x86 BIOS / UEFI,rpifor Raspberry Pi SBCs). Previouslynetboot-x86mixed an arch suffix with the boot source whileusb-rpiused a HW-family suffix, hiding that therpivariant is platform-specific (RPiOS + Pi firmware + per-SoC kernels) rather than just arch-specific. New variant names:Old
New
netboot-x86netboot-pcusb-x86usbboot-pcusb-rpiusbboot-rpiOperators with custom
BTY_VARIANT=...env settings (e.g. CI scripts invokingmake build VARIANT=...) must update them. The release artifact filenames change in lockstep:bty-netboot-pc-x86_64-v*.{vmlinuz,initrd,squashfs},bty-usbboot-pc-x86_64-v*.iso,bty-usbboot-rpi-arm64-v*.img.gz. Existing files inBTY_PATHS_BOOT_DIRfrom prior releases are ignored by the new bty-web (which now looks for the-pc-prefix); upgrade workflow is “fetch netboot artifacts” from the UI after bumping the container, same as any other release upgrade.Wheel asset dropped from GitHub releases.
bty-labis distributed via PyPI; the wheel on the GH release was redundant withpipx install bty-laband nothing in the tree consumed it from there. The sdist (bty_lab-X.Y.Z.tar.gz) is kept as the archival source release for distro packagers and offline mirrors.
[0.52.0] - 2026-06-14¶
Fixed¶
Five environment variables in the deploy template / docs were dead names the server never reads. Operators following
bty-lab init’senvvars(or the docs) were setting variables bty-web ignored:BTY_TRUSTED_PROXY->BTY_SERVER_TRUSTED_PROXY(real client IP in audit logs behind a proxy),BTY_MAX_UPLOAD_BYTES->BTY_TUNING_MAX_UPLOAD_BYTES(raise the upload cap; the 413 error message named the wrong var too),BTY_SESSION_SECRET->BTY_SERVER_SESSION_SECRET,BTY_BACKUP_MAX_PARALLEL->BTY_TUNING_BACKUP_MAX_PARALLEL, andBTY_TFTP_PROBE_HOST->BTY_NETBOOT_TFTP_PROBE_HOST. The session-secret one was emitted uncommented, so a pinned secret was silently dropped, breaking cookie continuity across a multi-instance / blue-green deployment.
Security¶
Redact
Bearertokens from subprocess (curl) stderr before it reaches the flash progress UI / logs, so a short-lived oras registry token in a captured stream can’t be replayed.
Documentation¶
Add an “Integrity and trust model” section (what bytes are verified vs what is trusted; the unauthenticated
/pxe/*surface and trusted-LAN assumption) and a “Recovering from a failed or interrupted flash” section.
[0.51.0] - 2026-06-14¶
Added¶
Integrity verification during flash: when an image source commits to a content digest, bty now verifies the streamed bytes against it on the wire and aborts with an error on mismatch, instead of silently writing a corrupted or tampered download. The hash is computed in the pipeline (
curl | tee | sha256sum | dd), so it adds no measurable overhead and the payload never passes through Python. Two sources are covered:oras://references (the layer digest, frozen at resolve time) and plain-HTTP catalog / bty-web images carrying a declaredsha256(the interactive catalog entry’s field and the PXE plan’s newdisk_image_sha). Sources with no declared digest keep the existing zero-copy stream unchanged.
[0.50.0] - 2026-06-13¶
Changed¶
usb-rpi: the Raspberry Pi USB flasher is now built by customizing Raspberry Pi OS Lite (arm64) in place (download + loop-mount + chroot) instead of Debian live-build. RPiOS ships every Pi kernel + every
bcm*.dtb(incl. the CM5 / CM5IO device trees) + firmware + bootloader, so the image boots Pi 4 / CM4 / Pi 5 / CM5 with no per-board branching. This fixes the previous live-build image, which shipped a boot partition with no device trees at all and could not boot a CM5 (“Device-tree file bcm2712-rpi-cm5-cm5io.dtb not found”). The bty TUI, services, and hooks are unchanged; they are grafted onto the RPiOS rootfs via the sameincludes.chroot/tree andconfig/hooks/.
[0.49.0] - 2026-06-13¶
End-to-end audit sweep covering every surface (web endpoints, TUI + flash pipeline, ORAS resolver, live-build, CI workflows, server appliance, docs). Three classes of fix landed: real data-loss guards, defensive-depth fixes against silent failures, and a pre-1.0 cruft cleanup.
Fixed¶
flash: every
ddinvocation now passesoflag=directin addition toconv=fsync. With only conv=fsync the kernel page cache shadowed the writes when the target happened to be the disk we booted from; the running OS’s binaries kept executing the OLD content until the page was evicted while the on-disk side took the new image. O_DIRECT bypasses the page cache so the in-RAM and on-disk states stay consistent through an in-place reflash.flash: gzip’s uncompressed-size trailer wraps mod-2^32. For any
.img.gz>= 4 GiB the reported number is a lie;validate_planwould have happily greenlit flashing a 5 GiB image onto a 4 GiB disk because gzip -l reported 1 GiB._parse_gzip_listingnow detects the wrap (uncompressed < compressed) and returns None so the size-fits-target check is skipped with a note rather than fooled.tui: the wizard’s flash worker now passes a cancel callback through to
execute_plan. Pre-fix, Ctrl+C during a flash interrupted t.join() in the main thread but the daemon worker’s dd / curl / zstd subprocesses kept writing – the operator’s “abort” left a partial-write in flight on the target disk with no cleanup. The wizard now SIGTERMs the pipeline and renders a dedicated “Cancelled – target holds a partial write” panel.oras:
pick_image_layerpreviously filtered sidecar layers only by title-annotation suffix, then took the largest remaining layer. A Helm OCI chart, Cosign signature, in-toto attestation, or SBOM lacks the sidecar suffix and would be returned as the “image” layer; bty.flash would then dd the chart’s tar+gzip headers or the signature bytes into the target’s MBR. Add an explicit mediaType refusal pass before the size-pick covering Helm / Cosign / in-toto / DSSE / SPDX / CycloneDX / OCI-empty.web:
ReleaseFetchManager.cancel(tag)now validates against_TAG_REsymmetrically withenqueue. Previously cancel fell through to the base manager’s plain_states.get(key), so attacker-controlled text in a DELETE /boot/releases/{tag} request returned a 404 + logged the literal string as the audit row’s subject_id; cancel now raises 422 on malformed tags.web:
fetch_release_catalogis an async route that calledurllib.request.urlopen(timeout=30)inline on the event loop. An unreachable release host stalled every other request handler (including the SSE event stream’s heartbeat) behind it for the full 30s; wrap inasyncio.to_thread.web:
_release_mgr._backfill_from_eventssoft-failed on any exception with no log line at all. Replace the barereturnwith alog.debug(..., exc_info=True)so a real schema/data problem at startup leaves a breadcrumb when debug logging is on; the soft-fail behaviour itself is unchanged.media:
bty-clock-from-http’s URL loop wascandidate_urls | while read: POSIX sh runs the right side of a pipe in a subshell, so everyexit 0inside the loop body only exited the subshell. The trailinglog "no candidate URL produced a Date: header"then ran unconditionally, including after a URL had successfully stepped the clock – an operator chasing a phantom NTP failure read this as proof the script had no effect when it had actually done its job.media:
pack_rpi_img._locate_lb_outputusedrgloband took the first match; on filesystems that listedbinary/EFI/vmlinuz.efibeforebinary/live/vmlinuzthe Pi image would have shipped the wrong kernel against the squashfs. Preferbinary/live/explicitly.media:
bty-on-tty1.servicehadRestart=on-failure / RestartSec=5with noStartLimit. A first-frame crash (corrupted catalog stick, malformed bty.toml) would have respawned the bty splash on tty1 every 5 seconds forever with no way for the operator to see the real error short of switching ttys. AddStartLimitIntervalSec=60+StartLimitBurst=5.
Changed¶
ci:
publish-tftppreviously needed only[test, build-ipxe]so a tag whose PXE chain test or USB Ventoy boot test was red still pushedghcr.io/safl/bty-tftp:vX.Y.Zand moved:latest. Gate it on the same release-blocking pipeline asattach-to-release/tag-releaseso a red test anywhere blocks the tftp container too.deploy:
bty-lab upgradeno longer migrates v0.41envvarsfiles to v0.42bty.toml. The migration code (_envvars_to_bty_toml+ helpers) is gone;upgradenow hard-errors with a pointer atbty-lab deploy --forcewhenbty.tomlis missing.tui: drop the no-op
_print_source_summary/_print_selection_so_farshims and their three call sites.
Docs¶
Replace every legacy
BTY_STATE_DIR/BTY_BOOT_DIR/BTY_BACKUP_DIR/BTY_CATALOG_FILE/BTY_WEB_*reference in doc sources with the canonical section-prefixed names (BTY_PATHS_STATE_DIRetc.,BTY_SERVER_*).quickstart.md’s “Flash a USB stick” code block was twoddinvocations spliced together. Replaced with a singlesudo dd ... oflag=direct conv=fsyncthat matches the in-code rule and copy-pastes cleanly.Catch up to v0.40-bytes-less + v0.46-catalog-source +
boot_moderenames across walkthrough-catalog, walkthrough-server-docker, tutorials/bty-netboot, README, reference, operations, concepts, and bty-media README.Add
usb-rpito dependencies.md’s build-deps table; tutorials/bty-usb-rpi.md’s QEMU smoke-test block now createsfake-emmc.qcow2before referencing it.
Tests¶
Add
tests/test_pack_rpi_img.py– regression coverage on the pure pieces of the arm64 Pi-image assembler (config.txt invariants, cmdline.txt format, partition-size constants,_locate_lb_output’s preference for binary/live/).Gzip-wrap rejection test, ORAS Helm/Cosign refusal tests, TUI cancel-key validation test.
Removed¶
vN.NN+:historical version-prefix markers stripped from in-code comments + docstrings + template comments. Per pre-1.0 break-freely the codebase doesn’t owe readers a comparison to a prior shape that no longer exists.
[0.48.0] - 2026-06-12¶
Doc + observability sweep after v0.47.0. The /reference release-
artifacts table and the bty-media README now advertise the
usb-rpi variant explicitly; a worker-loop safety net in
_BaseAsyncManager keeps a buggy _run_one from wedging a
slot; an event-log backfill that bailed silently now leaves a
debug breadcrumb; and a dead CI matrix branch is removed.
Added¶
_BaseAsyncManager._worker(used by ReleaseFetchManager + BackupManager) wraps the per-key dispatch in a worker-loop safety net: if_run_oneleaks any exception, the worker logs it, marks the statefailedwith a typederror, and stays alive to pull the next key. Production subclasses already catch their own exceptions; this is defence-in-depth for the day one doesn’t._release_mgr.ReleaseFetchManager.backfill_from_eventsnow emits alog.debug(with traceback) when it soft-fails. The soft-fail behaviour itself is unchanged (a corrupt events row or a freshly-created DB without the table is not fatal at startup); the breadcrumb just makes a real schema/data problem visible when debug logging is on.
Fixed¶
/reference’s release-artifacts table now lists thebty-usb-rpi-arm64-v*.img.gzasset that v0.47.0 added. Previously operators browsing /reference saw only the x86 artifacts and had no signal that the arm64 image existed.bty-media/README.mdre-headers as “three variants”, extends everymake build VARIANT=...usage line to includeusb-rpi, retitles the “Build prerequisites” block from “usb-x86 + netboot-x86” to “All three variants”, and corrects a stale--binary-images netbootcaption totar(v0.47.0 dot-release switched arm64 away fromnetbootto dodge lb’s x86-only tftpboot pipeline).
Removed¶
Dead
Enable KVM access for the runnerstep from.github/workflows/ci-cd.yml’sbuild-mediajob. The step gated itself onmatrix.kind == 'disk'but the matrix only carrieskind: live; the step never ran.
[0.47.0] - 2026-06-12¶
New usb-rpi arm64 image variant: a USB-bootable Raspberry-Pi
flasher (CM5 on IO-board, Pi5, Pi4) that runs the bty TUI on the
Pi itself and writes the operator’s chosen catalog image onto
local eMMC or NVMe.
Added¶
bty-usb-rpi-arm64-v<version>.img.gzrelease asset.ddit to a USB stick, plug into a CM5 / Pi5 / Pi4, USB-boot, pick a catalog image + target disk in the wizard. Headline workflow: reflashing a CM5 in a closed IO-case (eMMC) without the jumper-rpiboot-Etcher disassembly dance, or reflashing a Pi5 NVMe HAT in situ.BTY_VARIANTtri-state environment knob drivingbty-media/live-build/auto/config:netboot-x86(default, amd64 PXE trio),usb-x86(amd64 hybrid ISO), orusb-rpi(arm64 netboot trio + Pi-image packaging post-process).bty-media/scripts/pack_rpi_img.pyassembles a 3-partition raw disk image from the lb arm64 output: FAT32RPIBOOTwith theraspi-firmwareblobs + kernel + initrd +config.txt(dtparam=pciex1for Pi5 NVMe enumeration) +cmdline.txt, ext4BTY_LIVEholding the squashfs at/live/filesystem.squashfs, exFATBTY_IMAGESscratch (auto-grows on first boot viabty-usb-grow.service).New CI job
build-usb-rpion a native arm64ubuntu-24.04-armrunner, gatingtag-releaseandattach-to-releasethe same waybuild-usb-x86does.New tutorial
docs/src/tutorials/bty-usb-rpi.mdcovering download, dd-to-USB, the CM5 / Pi5 / Pi4 BOOT_ORDER prereq, the on-Pi flow, and local QEMU-virt verification for developers.
Fixed¶
TUI’s
[d] defaultcatalog shortcut no longer 404s. v0.46 stopped publishing a bty-sidecatalog.tomlmirror but the TUI’s_BTY_DEFAULT_CATALOG_URL(the wizard’s[d] default, themake tui CATALOG=defaultdeveloper shortcut, and the operator-facing reference docs) was missed and still pointed at the now-absentsafl/bty/releases/latest/download/catalog.toml. Repointed at the upstream image-builder’s catalog (safl/nosi/releases/latest/download/catalog.toml) so the TUI matches the bty-web side.
Changed¶
BTY_USB_ISO=1env knob removed (pre-1.0 break-freely; the twocijoe/scriptscallers are updated). Replaced byBTY_VARIANT=<variant>.bty-media/live-build/config/package-lists/bty-base.list.chrootsplit into arch-agnostic +bty-base.list.chroot_amd64(x86 kernel + microcode + DKMS build env) +bty-base.list.chroot_arm64(arm64 kernel +raspi-firmware+ Pi WiFi firmware). live-build only consumes a suffixed file when the matching--architecturesis active.The r8125 DKMS hook (
0600-bty-r8125-dkms.hook.chroot) early-exits on non-amd64; the bootloader-menu suppression binary hook (0500-bty-skip-bootloader-menu.hook.binary) early-exits when neitherbinary/isolinux/norbinary/boot/grub/exists (the case on every non-iso-hybrid build).make build VARIANT=usb-rpiroute added to the Makefile;make helpadvertises it.
[0.46.0] - 2026-06-11¶
bty stops publishing its own catalog.toml mirror and consumes
the upstream image-builder’s auto-generated catalog directly.
Default catalog source flips from safl/bty (a 7-variant hand-
maintained mirror) to safl/nosi (the 16-variant canonical
catalog that nosi’s CI publishes on every release).
Added¶
Settings > Upstream sourcesgains a separateCatalog repofield (defaultsafl/nosi) alongside the existingNetboot repofield (defaultsafl/bty). The two are independent: an operator can fork bty for custom netboot artifacts while still pulling the upstream catalog, and vice versa.
Changed (operator-visible)¶
Default catalog is now nosi’s, so the variants nosi publishes but bty’s stale mirror did not (Proxmox, the lxc variants, ubuntu-2604-wsl, ubuntu-2604-docker, rpios-13 headless + desktop, debian-13-desktop) appear out of the box.
Settings page: the
Release repofield is gone; in its place the two new fields above. An operator who had pinned the legacyupstream.release_repooverride needs to re-pin through the two new knobs (pre-1.0 break-freely, no migration shim).
Removed¶
scripts/generate_catalog_toml.pyandscripts/starter_catalog.toml.in. The CI step that ran them and thecatalog.tomlrelease-asset upload. Bty’s release pages drop one asset; operators pointing--catalog https://github.com/safl/bty/releases/latest/download/catalog.tomlat the bty release switch tohttps://github.com/safl/nosi/releases/latest/download/catalog.toml._settings_store.KEY_RELEASE_REPO/resolve_release_repo/default_release_repoand the_releases.DEFAULT_REPOalias. Their replacements areKEY_NETBOOT_REPO/KEY_CATALOG_REPO/resolve_netboot_repo/resolve_catalog_repo/default_netboot_repo/default_catalog_repo, withDEFAULT_NETBOOT_REPOandDEFAULT_CATALOG_REPOas the built-in constants.
[0.45.1] - 2026-06-11¶
Technical-debt sweep across six rounds: 15 separate findings, none operator-visible on its own, all aimed at trimming the surface that had accumulated since the v0.40/v0.42 cleanups.
Fixed¶
Dashboard’s “TFTP daemon running” row no longer cross-flags a container deploy where bty-web has no visibility into the bty-tftp sidecar (advisory blue
iwhenrunning_in_container()state is
unknown, kept as a warning on bare-metal).
Container deploys with
-e BTY_SERVER_PORT=...overrides now stay healthy: the Dockerfile’s HEALTHCHECK probesBTY_SERVER_PORT(the canonical name) instead of the legacyBTY_WEB_PORTit had been pinning at the stale 8080 default.
Changed (cleanup)¶
Legacy v0.42 env-alias table removed.
_LEGACY_ENV_ALIASES(mappingBTY_WEB_PORT/BTY_STATE_DIR/BTY_MAX_UPLOAD_BYTES/ … onto the canonicalBTY_<SECTION>_<KEY>convention) was a one-release migration shim; it had outlived its window (we’re now three releases past v0.42). DirectBTY_STATE_DIR/BTY_SESSION_SECRET/BTY_BACKUP_MAX_PARALLELfallback reads in early-boot bootstrap paths are also gone. Operators upgrading from a v0.41 envvars deploy still get the one-shotbty-lab upgrademigration that translates envvars → bty.toml.The container Dockerfile + tests now consistently use the canonical names (
BTY_PATHS_STATE_DIR,BTY_PATHS_BOOT_DIR,BTY_SERVER_HOST,BTY_SERVER_PORT).docs/src/dependencies.mdenv-vars table rewritten for the v0.45 bty.toml shape; the obsoleteBTY_CATALOG_MAX_PARALLEL/BTY_HASH_MAX_PARALLEL/BTY_MAX_UPLOAD_BYTES/BTY_IMAGE_ROOTrows are gone.Schema-rotation event summary in
state.db(system.schema.reset) no longer claims to preserve “images under BTY_IMAGE_ROOT”; it describes the actual v0.40+ bytes-plane layout (withcache volume / oras registry).Stale comment cross-refs to v0.40-removed endpoints (
/catalog/downloads,/catalog/hashes,/catalog/cache/{name},PUT /images/{name}) scrubbed from_events.py,_app.py,_security.py, and_backup.py.Em-dash-substitute
--removed from inline prose comments inoras.py,_portability.py,_releases.py,_withcache.py, and_reqctx.py(per the repo’s “no em-dashes, no ASCII –” prose rule).
Defensive¶
bty.deploy._detect_host_addr’s LAN-IP UDP probe gains an explicitsettimeout(5)matching the sibling probe inbty.web._config; a pathological resolver cannot hangbty-lab init.bty.oras._urlopen_retry(the OCI token / manifest fetcher) caps the response body at 10 MiB. A hostile registry returning a 500 MiB “manifest” now raisesOrasErrorinstead of consuming heap.
[0.45.0] - 2026-06-11¶
Oras catalog entries now warm withcache the same way https entries always have. The plan-endpoint and the dashboard’s Check button converge on one stored “canonical URL” per catalog row.
Added¶
New
catalog_entries.resolved_srccolumn carrying the canonical plain-HTTPS URL the row actually fetches from. For https sources this equalssrc; fororas://sources it’s the registry blob URL (https://<host>/v2/<repo>/blobs/sha256:<digest>) resolved once at catalog import viabty.oras.resolve_ref. Pre-1.0 schema change; existing DBs auto-rotate per_db.py’s schema-mismatch handler, so operators upgrading get a clean schema and re-import (or auto-import re-seeds fromcatalog.toml)._withcache.is_cachedgains an optionalheaders=kwarg matching the new sibling withcache-0.4.0 contract.
Changed¶
Dashboard “Check” button warms withcache for oras entries. Previously the Check on an
oras://row reportedwithcache n/abecause withcache spoke plain HTTP only. The Check now reads the row’sresolved_src, mints a fresh anonymous OCI bearer just-in-time, and HEADs withcache withAuthorization: Bearer .... Withcache 0.4.0+ forwards that header into its background fetch worker so a 401-gated blob actually fills the cache on the first probe.PXE plan rewrite uses
resolved_srcfor the cache key. On a cache hit the plan returns withcache’s/b/<token>/URL regardless of the original scheme; on a cold cache an https origin still flows direct, while oras stays on bty-web’s/images/{ref}proxy (the live env can’t carry an OCI bearer per fetch). Thecache_decisionrecorded for/ui/eventsnow distinguishesserved_from = withcache | origin | bty-web-proxy.The “withcache n/a” badge survives only for catalog rows with a NULL
resolved_src(file:// or a failed import).
Requires¶
The container deploy needs
ghcr.io/safl/withcache:0.4.0(or:latestonce the release rolls). The Authorization forwarding added there is what makes the oras warm-cache flow actually fetch instead of 401ing anonymously. Stockbty-lab upgradepulls the latest tag.
[0.44.3] - 2026-06-11¶
Deploy hotfix: bty-web and withcache containers now start on hosts
without the nftables package, plus a small dead-knob doc trim.
Fixed¶
bty-lab deployandupgradenow drop a/etc/containers/containers.conf.d/zz-bty-firewall.confthat pins podman/netavark to the iptables backend. Without it, a freshly imaged Debian trixie host that ships only iptables (and not the optionalnftablespackage) hitnetavark: nftables error: unable to execute nft: No such file or directoryand the bty-web / withcache containers exited 127 with no useful surface in their own logs. bty-tftp escaped because it uses host networking and doesn’t go through netavark.purgeremoves the drop-in.
Changed¶
The dead
BTY_CATALOG_MAX_PARALLELandBTY_HASH_MAX_PARALLELknobs are gone fromenvvars.exampleand the live envvarsbty-lab deployrenders. The DownloadManager / HashManager subsystems they tuned were removed back in v0.40; the env vars outlived them in the docs only.BTY_BACKUP_MAX_PARALLELstays.
[0.44.2] - 2026-06-11¶
Live-env hotfix: r8125 module-loading shape on Secure-Boot hardware, plus a small at-the-console triage tooling bump.
Fixed¶
Live ISO on Secure-Boot hardware now keeps the NIC. The previous
zz-bty-r8125-prefer.confaliased the RTL8125 modalias to the unsigned DKMS r8125 module, which the kernel rejects under[integrity]lockdown with EKEYREJECTED. With no fallback, the NIC stayed dark for the whole boot (observed on ASUS PN51-E1 stock firmware). The new shape issoftdep r8169 pre: r8125: udev still loads the signed in-tree r8169, and r8125 is attempted as a soft pre-dep so it can still win the bind on a non-Secure-Boot G10.
Added¶
pciutilsandusbutilsbaked into the live env so an operator at the console haslspci -kandlsusb -tfor “why isn’t this hardware visible?” triage without falling back to a /sys walk.
[0.44.1] - 2026-06-11¶
Maintenance release: technical-debt sweep, no behaviour change for operators beyond the items below.
Changed¶
The per-MAC boot-plan request (
pxe_plan) now opens the state DB once per flash request instead of four times (catalog binding + withcache lookup share one connection).The live-env
systemctl rebootand the host-IP autodetect socket are now bounded by timeouts, so a wedged systemd / resolver can’t hang them.
Internal¶
Deduped
_normalise_mac/_client_ipinto a shared_reqctxmodule;_safe_pathnow routes its basename check through_security.validate_basename.Dropped dead code: the unused
host_addrQuadlet param and the pre-1.0resolve_release_tag/DEFAULT_RELEASE_TAGaliases.Doc / Makefile fixups (default password is
bty-lab;.PHONYcompleteness).
[0.44.0] - 2026-06-10¶
Fixed¶
withcache was silently bypassed on container deploys. The flash path’s
resolve_withcache_urlread a DB override then$BTY_WITHCACHE_URLthen nothing – it never consultedcfg.withcache.url. v0.42 moved the URL intobty.tomland the slim compose/Quadlet stopped setting the env var, so a stock deploy resolved no URL: images streamed from origin, withcache got no HEAD and stayed empty even thoughbty.tomlhad the URL. The resolver now layers DB override >cfg.withcache.url>$BTY_WITHCACHE_URL> none.
Added¶
Per-entry “Check” action on the Images page. Each catalog row gets a Check button that probes, point-in-time, whether the origin is reachable (and how big) and whether withcache already holds it. The withcache HEAD also warms an auto-fetch withcache, so Check on a miss doubles as a one-click “start caching this” – check again shortly and it flips to cached. A dead origin is reported inline, not as an error.
The withcache decision is now observable on the flash path.
is_cachedlogs each HEAD (info on hit/miss, warning on an unreachable cache), and the per-machinenetboot.pxe.planevent recordswithcache = {configured, hit, served_from}– visible in/ui/events.
[0.43.1] - 2026-06-10¶
Changed¶
The well-known default password is now
bty-lab(wasbty) for BOTH bty-web and withcache. A stockbty-lab deploywrites it toenvvars/bty.tomland prints it; change it before exposing the host. bty-web’s runtime fallback (when nothing is configured) is the same value.
Fixed¶
bty-lab deploy --systemdbaked the wrong withcache password into the Quadlet unit. The generatedwithcache.containerhardcodedWITHCACHE_ADMIN_PASSWORD=change-mewhile the deploy summary +envvarsadvertised a different password, so operators following the printed credential were locked out of withcache. The deploy/upgrade paths now bake the chosen password into the unit; the stand-aloneinit --systemdreference unit keeps the editable placeholder.Settings-page edits failed on container deploys. Saving config rename-replaced onto the
bty.tomlbind mount, which fails withEBUSY; added an in-place-write fallback. The generated compose also mountedbty.tomlread-only, contradicting the Quadlet’s RW mount – dropped the:ro. Referencedeploy/compose.yml+deploy/quadlet/bty-web.containerrefreshed to the v0.42 bty.toml shape (they still showed the old envvars-era env lines).Documentation corrected: the env-var tables and
/ui/logintext saidBTY_ADMIN_PASSWORDunset meant “open access”; auth has been always-on since v0.41.3.
Internal¶
The two
/ui/imagesform-post paths now recordcatalog.entry.add.failedon a duplicate add, matching the JSON endpoint (audit-trail parity).Tests for the v0.41 legacy env-alias contract;
actions/checkout@v6across all CI jobs; assorted dead-code / stale-comment cleanup.
[0.43.0] - 2026-06-10¶
Added¶
bty-lab purge [DEST]– the inverse ofdeploy: stops + removes the stack (auto-detects compose vs Quadlet/systemd, likeupgrade). Keepsdata/and the deploy dir by default;--datadeletes host state,--allalso removes the deploy dir (implies--data),--imagesdrops the pulled images. Destructive flags are gated by ay/Nconfirm (skip with--yes); teardown tolerates an already-gone service / container so a half-removed deploy still purges. Completes thedeploy/upgrade/purgeoperator lifecycle.
Fixed¶
The
/ui/netbootTFTP probe no longer hard-codes127.0.0.1. It now resolves its target from config – an explicit[netboot] tftp_probe_hostif set, otherwise the host of the withcache URL (the LAN address clients reach, where thenetwork_mode: hostbty-tftpsidecar serves udp/69). Previously the probe read a separate$BTY_TFTP_PROBE_HOSTenv var that the v0.42 slim-down dropped, silently falling back to loopback and reporting an otherwise-healthy TFTP server as unreachable. The[netboot] tftp_probe_hostconfig key is now actually consulted (it was only displayed on the Settings page before), and the field default changed from"127.0.0.1"to""(= derive).The Settings-page DHCP/PXE cheatsheet suggests the configured advertised host (withcache URL host) for
Next-Serverinstead of a sniffed container-internal interface, which inside a bridge-network container pointed at the wrong address.
[0.42.0] - 2026-06-10¶
bty-web operator config moves from env vars to bty.toml.
The v0.41-era envvars shell file and the ~17 $BTY_* env vars
it plumbed through compose.yml / Quadlet were a 12-factor relic
that didn’t fit a long-running daemon. v0.42 makes bty.toml the
canonical config: one structured file the operator edits (or the
Settings page round-trips edits through), layered with per-key env
overrides for k8s Secrets / one-shot dev runs.
Added¶
src/bty/web/_config.py–Configdataclass + nested section dataclasses covering every operator-tunable knob, with a layered loader (defaults < TOML files < env vars), per-key provenance tracking, and atomlkit-based writer that preserves operator comments + ordering across round-trips.--config PATHCLI flag forbty-web(repeatable; each later one overrides earlier per-key). Plus$BTY_CONFIG_FILE/$BTY_CONFIG_DIRenv conventions.Drop-in config directories (
/etc/bty/conf.d/*.tomlloaded in lexicographic order) following the nginx / sshd convention.BTY_<SECTION>_<KEY>env-override convention. Setting one key via env doesn’t force the rest to be set.Default config search list when nothing is passed:
/etc/bty/conf.d/,/etc/bty/bty.toml,<state_dir>/bty.toml.
Changed¶
bty-lab deploywrites a populatedbty.tomlalongsideenvvars.envvarsshrinks to the compose-substitution basics (HOST_ADDR,WITHCACHE_ADMIN_PASSWORD); the BTY_* knobs move tobty.toml.compose.ymlbty-webenv block shrinks from ~10 entries to one (BTY_CONFIG_FILE=/etc/bty/bty.toml) + a bind-mount of./bty.tomlinto the container at/etc/bty/bty.toml.Quadlet
bty-web.containerlikewise: a singleEnvironment=a
Volume=of the absolute<dest>/bty.tomlpath.
Settings page rows now surface the canonical
BTY_<SECTION>_<KEY>env names (e.g.BTY_PATHS_STATE_DIRinstead ofBTY_STATE_DIR).bty-web --helpdescription rewritten: documents the layered resolution + the override convention, drops the per-knob list.
Compatibility¶
Legacy env names still work as aliases for one release. The loader’s
_LEGACY_ENV_ALIASEStable maps the v0.41 flat names (BTY_STATE_DIR->[paths] state_diretc.) onto the new schema so an unmodifiedenvvarskeeps working. Removal scheduled for v0.43; new deploys + the Settings page surface only the canonical names.bty-lab upgradepreserves bothenvvarsAND the existingbty.tomlif present.
Removed¶
The per-knob
Environment=lines in compose / Quadlet (BTY_ADMIN_PASSWORD/BTY_SESSION_SECRET/BTY_MAX_UPLOAD_BYTES/BTY_BACKUP_MAX_PARALLELetc.). Operators wanting per-key env overrides set them directly on the container (-e BTY_<SECTION>_<KEY>=...) rather than going throughenvvars.
[0.41.5] - 2026-06-09¶
TFTP probe + withcache URL now reach the right host on container deploys.
Fixed¶
BTY_TFTP_PROBE_HOSTdefaulted to127.0.0.1, which on a container deploy resolves to bty-web’s own loopback – not thebty-tftpsidecar (which usesnetwork_mode: hostand binds udp/69 on the host’s LAN address). The Netboot page’s TFTP probe always reported “unreachable”. The bty-web compose service + Quadlet unit now bakeBTY_TFTP_PROBE_HOST=${HOST_ADDR}by default; operators can still override in envvars._quadlet_bty_webpreviously leftHOST_ADDR_HEREas a literal placeholder in the emitted Quadlet, since Quadlets don’t expand env-file references the way Compose does.deploy_mainnow detectshost_addrBEFORE emitting and bakes the real LAN IP into the unit body;upgrade_mainreads it back from the preservedenvvarsfile. The stand-alonebty-lab init --systemdpath (no host_addr known) still emits the placeholdera hand-edit note in the header.
Added¶
Both
BTY_WITHCACHE_URLandBTY_TFTP_PROBE_HOSTare now override-able via envvars (compose entries use the${VAR:- default}form). envvars.example carries commented-out hints for each.bty-web --helpdocuments both.Settings -> Network card surfaces
withcache base URL+TFTP probe targetrows so operators can see where bty-web is pointing without dredging the env.
Removed¶
Settings -> Network card drops the static
TFTP systemd unit: dnsmasq.servicerow – meaningless on container deploys, and the TFTP probe (now correctly targeted) is the canonical signal.
[0.41.4] - 2026-06-09¶
Polish pass on the v0.41.3 UI work.
Added¶
make web– fast iterate-locally target. Runsuv run bty-webstraight from the source tree with state under/tmp/bty-web-dev; skips the container entirely so rootless- Docker’s uid-namespace woes don’t block the day-to-day loop. Log in with the well-known default passwordbty.
Fixed¶
The
Fetch '<tag>' catalog/Fetch '<tag>' artifactsbuttons rendered the tag inside<code>, which Bootstrap colours pink-red at ~2:1 contrast against the primary-blue button background – effectively unreadable. Use<strong>instead (plain bold, inherits the button’s white text).
Changed¶
Settings page’s compact read-only cards (Identity / Storage paths / Network / Background workers) reorder each row from
LABEL/VALUE/ENV_VARtoLABEL/ENV_VAR/VALUE– matching the naturalVAR=valueshape an operator types into a shell.
[0.41.3] - 2026-06-09¶
Always-on auth + tag/repo separation + live TFTP probe + Settings grid.
Security¶
BREAKING: auth is now always on. The prior “no
$BTY_ADMIN_PASSWORD-> open UI” behaviour was a footgun – a fresh deploy or a forgotten env var left every mutating route open.$BTY_ADMIN_PASSWORDstill overrides, but with the env unset the active password is the literal string"bty"(well-known default, logged on startup + flagged in a yellow alert on the login form). Deploys that relied on the open-access default must set$BTY_ADMIN_PASSWORDto a real value AND/OR accept that operators log in withbtyuntil they do._auth.auth_enabled()is removed (auth is always on);admin_password()now returns a non-None string;using_default_password()powers the UI warning.
Changed¶
Settings: release repo + catalog release tag + netboot release tag replace the prior single “catalog URL”
“netboot release tag” pair. The catalog URL is derived from the repo + the catalog tag; pin them independently to flow catalog updates while keeping netboot artifacts on a known release (or vice-versa). The fetch buttons reword to Fetch ‘
’ catalog and Fetch ‘’ artifacts so the active tag is visible at the click site.
Settings layout: two-column row of editable cards (Upstream sources | Backup schedule), then a four-column row of read-only diagnostic cards (Identity | Storage paths | Network | Background workers), then the full-width DHCP / Network Boot cheatsheet. Compact list-group rendering replaces the wide 4-column tables in the read-only cards.
/ui/netbootTFTP card now ships a live network probe: bty-web sends a TFTP RRQ foripxe.efito$BTY_TFTP_PROBE_HOST(default127.0.0.1:69) and reports reachable / file-present as two independent signals. Localhost is the default probe target; override the env var to point at a TFTP daemon hosted elsewhere.
Removed¶
_settings_store.KEY_CATALOG_URL+default_catalog_url(URL is derived now, not stored).KEY_CATALOG_TAG+KEY_NETBOOT_TAGtake their place.resolve_release_tagDEFAULT_RELEASE_TAGsurvive as one-release aliases forresolve_netboot_tag+DEFAULT_TAG; remove after v0.42.
[0.41.2] - 2026-06-09¶
UI consolidation + open-access nav fix. Three things ship together:
Fixed¶
logged_inin the layout context was False on every page when$BTY_ADMIN_PASSWORDwas unset – so an open-access install (the default for a fresh deploy) rendered with NO nav buttons (Machines / Images / Netboot / Settings all gone). The render helper now treats “auth disabled” as logged-in so the nav cluster renders for the open-access path. The Logout button stays hidden on auth-disabled installs (no session to clear).
Changed¶
/ui/netbootabsorbs the active release-fetch table + the Fetch artifacts trigger that used to live on/ui/downloads. The legacy/ui/downloadsroute is removed; bookmarks that pointed there now 404. The navbar drops its standalone Downloads worker pill (release-fetch progress is on /ui/netboot directly)./ui/imagesheader reorders toAdd image | Upload catalog | Fetch latest catalog– the most common operator add-path (single URL) lands first./ui/dashboardImages Summary drops theUploadedandLocal copyrows. Total / HTTP / ORAS are the surviving counts now that bty-web is out of the image-bytes plane.TFTP daemon card on
/ui/netbootis observation-only: status badge + a short triage hint pointing atsystemctl status bty-tftp(container deploys) orjournalctl -u dnsmasq.service(host installs). The Start / Stop / Restart buttons + the/ui/settings/tftp-controlPOST route + the sudo’dbty-web-tftphelper concept are removed; daemon lifecycle is a systemd / Podman concern, not an operator click target.
Removed¶
/ui/downloadsroute +downloads.htmltemplate/ui/settings/tftp-controlroute_sysconfig.control_tftp+tftp_controllable+SysConfigError+TFTP_HELPER+TFTP_ACTIONSnetboot.tftp.controlled+netboot.tftp.control.failedevent kinds
[0.41.1] - 2026-06-09¶
Hot-fix: bty-lab deploy in root mode left compose-managed
containers holding the ports the Quadlet services needed.
Symptom on the lab box: [11/15] starting stack succeeded
(compose brought up bty_withcache_1 / bty_bty-web_1 /
bty_tftp_1), then [14/15] starting systemd services
failed with Job for withcache.service failed + Job for bty-web.service failed because the compose containers were
still holding :3000 + :8080.
Fixed¶
deploy_mainin root mode no longercompose ups. Thecompose pullstays (warms the registry cache);compose up -dis replaced bycompose down(clears leftover compose- managed containers from a prior install – idempotent on a fresh host). Quadlet + systemd then own the lifecycle.Non-root mode is unchanged:
compose up -dremains the lifecycle there (no Quadlets to hand off to).upgrade_mainwas already correct – onlydeploy_mainhad the bug.
Regression test test_deploy_as_root_does_system_install asserts
compose up never appears in the root-mode runtime call sequence.
[0.41.0] - 2026-06-09¶
Cleanup release: ~2300 LoC of dead code + stale references removed in the wake of v0.40’s catalogs-not-bytes refactor. No behavioural changes for operators – everything still works exactly as v0.40 did.
Removed (code)¶
/ui/imagesFetch / Update / Cache-delete buttons + their JS handlers – the underlying/catalog/downloadsand/catalog/cache/{name}endpoints were deleted in v0.40; the buttons silently 404’d./ui/downloadsUpload-image trigger +uploadSelectedFile()JS – posted toPUT /images/{name}(deleted in v0.40).Navbar worker polling of
/catalog/downloads+/catalog/hashes– both 404 since v0.40; collapsed to the surviving/boot/releases+/workers/backupssources.bty.catalogorphans: the entirecatalog-<ref:12>-<slug>.<ext>naming machinery (_CATALOG_PREFIX,_CATALOG_REF_LEN,_slugify,local_filename_for,is_catalog_cache_filename,ref_prefix_from_cache_filename), the storage-format marker (StorageFormatMismatch,check_or_write_storage_marker,STORAGE_FORMAT_VERSION,_STORAGE_MARKER_FILENAME), the DownloadManager byte-pump (is_cached,fetch_to_cache,fetch_src_to_cache,_stream_with_digest,CatalogCancelled,ProgressCallback,CancelCheck),CatalogEntry.local_filename/.cached_pathmethods.bty.imagesorphans:merge_with_catalog,ensure_sha256,HashCancelled,HashProgressCallback,HashCancelCheck.bty.web._apporphans:_lookup_db_catalog_entry(the DownloadManager DB-only fallback), theimage_rootno-op kwarg oncreate_app.
Removed (deploy / Makefile / Dockerfile)¶
docker/Dockerfile:ENV BTY_IMAGE_ROOT=/var/lib/bty/imagesthe
install -dline that pre-created the directory.
Makefiledocker-runtarget: stopped pre-creatingbty-data/images/(now createsbty-data/boot/+bty-data/backups/).
Documentation¶
walkthrough-catalog.mdrewritten end-to-end for the v0.40 model (no dir-scan, no Hash/Fetch buttons, no DownloadManager).flows.mdaudit-log + actions + safety-gates tables trimmed for deleted endpoints + event kinds.walkthrough-image-store.md+operations.md+walkthrough-server-docker.mdhad their first-pass rewrites in PR #11; this round finishes the long-tail.reference.mdrewritten: deleted-endpoint rows out of the protected-routes table,BTY_IMAGE_ROOTscoped to thebtyCLI only,CatalogEntry.disk_image_shacomment realigned to “populated only when the publisher pinned it”.Module / function docstrings across
bty.catalog,bty.images,bty.flash,bty.web._app,_jobs,_releases,_backup,_models,_dbupdated to drop references to deleted symbols (HashManager,DownloadManager,merge_with_catalog,ensure_sha256,fetch_to_cache,fetch_src_to_cache,local_filename_for).
Test churn¶
834 (v0.40) -> 803 tests. Net -31, all dead-test deletions
(test_fetch_to_cache_*, test_is_cached_*,
test_local_filename_for_*, test_storage_marker_*,
test_recognised_filenames_*, test_merge_with_catalog_*,
test_ensure_sha256_*). Two UI-test assertions updated for the
trimmed navbar poll endpoints + the gone-Upload-image trigger.
[0.40.0] - 2026-06-09¶
bty-web is out of the image-bytes plane. Image bytes now live exclusively in withcache; bty-web holds the catalog (URL -> manifest entry), the machine inventory, the audit log, and the netboot artifacts. One rule: bty has catalogs; withcache has bytes.
Released as a ~3500-line subtraction across five refactor commits.
Removed¶
DownloadManager +
/catalog/downloadsendpoints. The Fetch button on/ui/images, the explicit-download lifecycle, thecatalog.cache.populated/catalog.fetch.*audit events – all gone. The live env’s flash request warms withcache directly via the HEAD probe in the plan endpoint.HashManager +
/catalog/hashesendpoints. The background SHA-256 worker, theimage.hashed/image.hash.*events, and the/ui/hashingpage.catalog_entries.disk_image_shastays for catalog-declared shas; no more late backfill.Image upload (
PUT /images/{name}). No drag-and-drop in the UI, no curl-PUT, noimage.uploadedevents. Ad-hoc images: host on your own nginx / GHCR / S3 and add a catalog entry pointing at it.PUT /boot/{name}survives for netboot artifacts.BTY_IMAGE_ROOT. No image-store directory. bty-web’s lifespan no longer dir-scans the filesystem;/var/lib/bty/is nowstate.db + boot/ + catalogs + session-secretonly.BTY_CATALOG_MAX_PARALLELandBTY_HASH_MAX_PARALLELenvvars retired alongside their managers./ui/imagesFetch / Hash buttons and thedata-cachedstatus badge. Catalog entries render with their src URL and the catalog metadata only.
Changed¶
Plan endpoint URL contract.
HTTPS + withcache configured + warm -> withcache’s
/b/<urlsafe-b64(origin)>/<basename>URL (unchanged).HTTPS + withcache cold or unconfigured -> the origin URL directly (was:
/images/{ref}/{name}stream-proxy). bty-web is out of the bytes path for https sources entirely.ORAS ->
/images/{ref}/{name}; bty-web proxies for the bearer-token resolve (withcache doesn’t speak oras yet; v0.41 follow-up).
GET /images/{key}shrinks to oras-only stream-proxy. Unknown keys, https-only catalog entries, and literal filename lookups all 404 now.
Migration¶
Upgrade in place with bty-lab upgrade /opt/bty then re-deploy.
Existing files under ./data/bty/images/ are no longer read or
written; rm -rf them to reclaim disk after confirming the
withcache deploy serves the same URLs (HEAD them at
http://<host>:3000/b/<urlsafe-b64(origin)>/<basename>).
Operator-uploaded images that didn’t live at a URL on the old deploy
need re-homing: drop them onto an HTTP server, push to GHCR via
oras, or any other URL-addressable store, then add a catalog
entry.
state.db survives untouched – the catalog_entries.disk_image_sha
column stays nullable; rows without it just lose the sha display.
Test impact¶
922 -> 834 (-88). Dir-scan, sha-by-filename, local-file serving, image-upload, image-store-survives-upgrade, mixed-shape-no-dupes, and the entire DownloadManager + HashManager test files gone.
[0.39.1] - 2026-06-08¶
Hot-fix: two unrelated stock-Ubuntu-host gotchas that made v0.39.0’s
bty-lab deploy unusable in real-world reproduction. Both surface
on a stock Ubuntu host with podman-compose installed but no other
podman / netavark deep-dive done.
Fixed¶
deployandupgradepre-create the bind-mount targets writable for any container UID. withcache’s image runs as USERapp, bty-web’s as USERbty. When podman auto-created./data/withcacheand./data/btythey came up root-owned mode 0o755 – not writable for those non-root container UIDs. withcache crashed onPermission denied: '/data/blobs'and bty-web stuck atCreatedviadepends_on. Fix:mkdir -p+chmod 0o777BEFOREcompose pull/up. World-writable is the right tradeoff here (single-tenant appliance host, the dir already holds operator-trusted bytes), and image USER directives stay respected.Container DNS hardcoded to sidestep the systemd-resolved / missing-aardvark-dns combo. Containers got
nameserver 10.89.0.1(podman’s bridge gateway, whereaardvark-dnsis supposed to forward from – but it isn’t installed by default on Ubuntu). Result: every outbound lookup failed:release fetch failed: <urlopen error [Errno -3] Temporary failure in name resolution>. Compose + Quadlets now emitdns: 1.1.1.1(overridable via the newBTY_DNSenvvar for internal-resolver LANs). Bty’s inter-service traffic already routes via host IP, so the earlier “aardvark-dns binary not found” warning is now genuinely cosmetic andapt remove aardvark-dnsworks without breaking the stack.Step counters bump 14 -> 15 (sudo-root path) and the equivalent in
upgradeto include the newprepared data dirsstep.
[0.39.0] - 2026-06-07¶
Polish pass on the v0.38.0 bty-lab deploy, from real-world
reproduction on a fresh nosi-built lab host. The headline fix is
the broken container-tag pin (:v0.38.0 vs the actual GHCR tag
:0.38.0) that made every deploy land in “manifest unknown”.
Bundled with that: the deploy/upgrade UX gets quieter, more
auto-detecting, and self-cleaning so the canonical install is a
single sudo uvx bty-lab deploy /opt/bty.
Fixed¶
GHCR tag pin matches what CI publishes.
compose.ymland the QuadletImage=lines previously emittedghcr.io/safl/bty-web:v{version}with a leadingv, but the publish job strips that prefix (the 201 historical container tags are all0.x.y, nov). Everydeployfailed with “manifest unknown” on bty-web + bty-tftp; onlywithcache:latestcame up. Drop thevfrom all four template positions.
Changed¶
bty-lab deployauto-detects install mode from euid (BREAKING: the previous--systemdflag is removed). Run as root: full system install (TFTP sidecar + Podman Quadlet units installed to/etc/containers/systemd/+ systemctl autostart). Run as a regular user: compose-only install – no TFTP, no autostart, with a loud “limitations” block at the end naming exactly what was skipped and thesudore-run command to promote. The privileged side is whatsudoalready implies;--systemdwas redundant.bty-lab deployhandles the deploy-dir prep itself – no moresudo mkdir -p /opt/bty && sudo chown "$USER:$USER" /opt/btypreamble. Run as root, the dest is created if missing and the whole emitted tree is chowned back to$SUDO_USERso the operator canvim envvarswithout sudo afterwards.[N/M]step numbering on everydeploy/upgradestep. Totals are computed up front from the auto-detected mode flags – 14 steps for a root+sudo deploy, 10 for a user-mode deploy, etc. Operator sees position-in-run as the install streams.upgradefollows the same root/user auto-detect. Refuses cleanly when the stack is Quadlet-managed but the upgrade was invoked without root – a plainpodman compose up -dwould race the systemd-managed containers.
Docs¶
Sidebar wordmark removed – Furo’s
sidebar_hide_name: Truedrops the redundant “bty” text next to the mascot logo. The H1 on the landing page also drops thebty -prefix; the logo + alt text identify the project on their own.Tutorials fetch the release asset directly. Ventoy / piKVM / JetKVM / BMC tutorials all referenced
~/system_imaging/disk/...(a build-host artifact path) – now use the standardrelease.tomldiscovery pattern thatquickstart.mdalready uses. Also fixes a stale.iso.gzfilename inbmc.md– current releases ship uncompressed.isoonly.Docs landing CI badge fixed –
ci.yml->ci-cd.ymlto match the actual workflow file. Was 404’ing in the sidebar.
[0.38.0] - 2026-06-07¶
bty-lab deploy + upgrade subcommands so first-boot of a bty
server is one command instead of a five-step chain. The deploy
emits compose files, auto-fills envvars (HOST_ADDR detected from
the host’s outbound-route IP; admin passwords default to bty
matching the historic PAM convention; session secret stays random),
and runs podman compose --profile tftp pull + up -d. With
--systemd, also installs Podman Quadlet units to
/etc/containers/systemd/ and starts the services (requires root).
upgrade is the in-place version-bump path: regenerates compose
against the CLI’s bty version, preserves envvars + data/,
pulls + restarts – auto-detecting Quadlet-managed vs compose-managed
stacks.
uvx bty-lab deploy /opt/bty– bring up bty-web + withcache in one command. Visual==> step: detailphase headers throughout; subprocess output (pull, compose up, systemctl) streams between boundaries.uvx bty-lab upgrade /opt/bty– in-place upgrade. Detects Quadlet-managed stacks from installed units under/etc/containers/systemd/and usessystemctl daemon-reloadrestartin that case; otherwisepodman compose pull+up -d.
bty-lab init(existing) now surfacesPermissionErrorfrom the dest mkdir with asudo mkdir + chownhint instead of a bare traceback inside theuvxwrapper. Probes for a compose backend on PATH; warns if missing instead of failing with the cryptic “looking up compose provider failed” later. RuntimeNext:block trimmed to three lines with--profile tftpbaked in, plus a pointer tobty-lab deployfor the one-shot path.Docs landing page (
docs/src/index.md) simplified nosi-style: mascot moves to the sidebar viahtml_logo, body trims to a two-paragraph tagline + toctrees.New Tutorials section:
ventoy.md,pikvm.md,jetkvm.md,bmc.md(Supermicro / iDRAC / iLO virtual media, with the license-paywall caveats up front).
Operator notes¶
The default admin password is
bty. Change it in/opt/bty/envvarsbefore exposing the host past a trusted LAN.The deploy dir needs to be writable by the operator. The docs lead with
sudo mkdir -p /opt/bty && sudo chown "$USER:$USER" /opt/btyso first-boot doesn’t trip the silent-fail mode where$EDITOR envvarsopens an empty buffer.initstays available for inspect-before-apply control: same files, no side effects.
[0.37.0] - 2026-06-06¶
Polish pass on the v0.36.0 bty-lab init bootstrap, from
real-world reproduction on a fresh nosi-built lab host. All
operator-facing; nothing breaks for hosts already running
v0.36.0 except the values-file rename below.
BREAKING (early adopters only): the values file is now
envvars, not.env.bty-lab initwritesenvvars.exampleand the rendered compose / README walk the operator throughcp envvars.example envvars && export COMPOSE_ENV_FILES=envvars && podman compose up -d. Reason:.envis a dotfile – invisible inls– and an operator scanning the deploy directory after the bootstrap couldn’t tell whether the file existed.envvars(the Apache convention) shows up in plainlsand is self-describing.Migration for an existing v0.36.0 deploy:
mv .env envvarsthen eitherexport COMPOSE_ENV_FILES=envvarsonce per shell or pass--env-file envvarson everypodman composeinvocation. Or re-runuvx bty-lab init --force .to regenerate the deploy directory with the new layout (state indata/survives).Every operator-facing env var is now documented in
envvars.exampleand plumbed through the compose env block. Previously onlyHOST_ADDR/WITHCACHE_ADMIN_PASSWORD/BTY_HOST_DATA_DIRwere surfaced; nowBTY_ADMIN_PASSWORD(gates the bty-web UI – previously you had to grep the source to find this),BTY_BOOT_RELEASE_REPO,BTY_TRUSTED_PROXY,BTY_SESSION_SECRET,BTY_MAX_UPLOAD_BYTES, and the threeBTY_*_MAX_PARALLELknobs are documented (commented-out, with default values + a one-line rationale each) and the compose references each withVAR: ${VAR:-}so uncommenting inenvvarsimmediately reaches the container. Tests pin both sides of the contract.Quickstart chain no longer dies on an unset
$EDITOR. Previously the renderedNext:hint and docs readcp envvars.example envvars && $EDITOR envvars && podman compose up -d; on a fresh shell withEDITORunset bash expanded that toenvvarsand tried to exec the values file, reporting-bash: envvars: command not found. The chain now uses"${EDITOR:-vi}"soviis the universal fallback.Operator-friction hints in the runtime
Next:line: the hint now mentions the--profile tftpvariant (BIOS-PXE clients; UEFI HTTP-Boot doesn’t need it) and thepipx install podman-composeprereq (podman composeis a wrapper that requires an external compose backend on PATH; with none installed the bootstrap errors with a seven-line “looking up compose provider failed” trace).
[0.36.0] - 2026-06-05¶
One-command container deploy: uvx bty-lab init. No more cloning
the repo to grab deploy/compose.yml; bty now ships a dedicated
bty-lab console script that emits a ready-to-run compose stack
pinned to its own version.
New
bty-lab init [DEST]console script. Writescompose.yml,.env.example, and a per-deployREADME.mdfor abty-web+withcachestack on any host that hasuv(orpipx) installed – no clone, no--fromindirection.bty-webandbty-tftpimage tags are pinned to the bty CLI version that produced the file, so the compose and the image bytes always match. Re-running with--forcerefreshes an existing deploy against a newer bty release.--systemdadditionally emits Podman Quadlet units for boot-autostart.--data-dirre-roots state onto a chosen disk; default is./data/{bty,withcache}bind-mounted next tocompose.yml.--printstreams the compose to stdout for pipeline use.bty-labis a standalone script, separate frombty. The flash wizard stays single-purpose (and a bareuvx bty-labdoes NOT pull in Rich or FastAPI); the lab-init module imports nothing from the [tui] / [web] extras. A barebty-lab(no subcommand) prints usage pointing atbtyfor the wizard, so somebody runningpipx run bty-labblind learns about the sibling commands.First-boot needs no UI configuration step. The emitted compose passes
BTY_WITHCACHE_URL=http://${HOST_ADDR}:3000to bty-web, which auto-discovers withcache on every request – the operator edits onlyHOST_ADDR+WITHCACHE_ADMIN_PASSWORDin.env.Operator-visible state directories. The deploy now uses host bind-mounts (
./data/bty/,./data/withcache/) instead of named volumes; state is where the operator put it, easy to back up and migrate.Documentation re-anchored around the new flow – the quickstart, walkthroughs, and
deploy/README.mdlead withuvx bty-lab initinstead of “clone +podman compose -f deploy/compose.yml”. The “lowest-barrier docker trial” framing is replaced by the canonical container deploy.
[0.34.0] - 2026-05-28¶
Robustness pass: clearer flash failures, sturdier disk discovery, and a hardened image bake. No behaviour change for a working appliance; the wins show up when something goes wrong.
Failed qcow2 flashes now report the real reason. A failed
qemu-img convertpreviously surfaced only a numeric exit code; bty now captures qemu-img’s diagnostic (Could not open ..., permission denied, corrupt-image) and includes it in the error, which is what an operator needs when a block-device write fails.Disk discovery degrades gracefully if
lsblkreturns unparseable JSON (a zero-exit-with-empty-output edge on cut-down busybox builds) instead of crashing the disk picker.The server-image bake fails loudly instead of shipping a broken image.
diskimage_buildnow verifies cloud-init actually completed (gated on a marker echoed only if the wholeset -euruncmd succeeded) and dumps the offending command on failure – and the r8125 DKMS build now targets the kernel whose headers are installed (the trixie-backports kernel the appliance boots) rather than a blindls | head -1, which on a kernel-version drift had silently shipped an appliance whose bty-web never started.Internal: corrected two inaccurate docstrings/comments (the audit
record()commit contract and the backup-cancel event note) and strengthened /ui/images action-button + scheduler-audit test coverage.
[0.33.30] - 2026-05-28¶
Starter catalog refreshed for nosi’s renamed variants. nosi
renamed its images from <distro>-sysdev to numbered
<distro>-<version>-<shape>, so the shipped starter catalog
pointed at repos that no longer publish. The catalog now lists the
seven flashable variants (Debian / Ubuntu / Fedora / FreeBSD
headless, plus a Fedora desktop); docs and the USB-ISO build were
updated to match. No behaviour change to bty itself – a fresh
install just gets a catalog whose entries resolve again.
[0.33.29] - 2026-05-26¶
Audit-log lifecycle for deferred operations + /ui/images UX refinements. Eleven commits split into two themes.
Audit-log: requested -> started -> terminal lifecycle¶
The audit log used to capture only terminal events
(image.hashed, backup.created, catalog.cache.populated,
netboot.artifacts.fetched). For deferred ops (worker-backed),
this meant /ui/events showed nothing between the operator’s click
and the worker’s eventual outcome – which can be minutes for
catalog fetches and release downloads.
Now each deferred op writes three lifecycle phases:
.requested– HTTP handler accepted the request and enqueued. Actor=operatorwhen the operator clicked (carries source_ip); actor=systemfor internal triggers (the backup scheduler tick)..started– Worker pulled the job off the queue and began work. Actor=system..cancelled– Operator-initiated stops now land in the audit log; pre-fix the manager flipped_states+ fired SSE but wrote nothing to /ui/events. Both the HTTP DELETE handler (actor=operator + source_ip) AND the worker (actor=system, observed the cancel flag) emit it, so the operator can distinguish “I cancelled it” from a shutdown drain.
Concrete new kinds:
catalog.fetch.requested/.started/.cancelled/.failed(also replaces the misleadingcatalog.fetch.sha_mismatch, which fired for ALL fetch failures, not just sha mismatches)image.hash.started/.cancelled(no.requested: the per-row Hash button was removed in this batch; every hash now arrives via auto-import or DownloadManager back-fill, both system-initiated)netboot.artifacts.fetch.requested/.started/.cancelledbackup.create.requested/.started/.cancelled
Plus a one-time rename pass for consistency:
image.hash_failed->image.hash.failedimage.upload_failed->image.upload.failedcatalog.entry.add_failed->catalog.entry.add.failednetboot.artifacts.fetch_failed->netboot.artifacts.fetch.failednetboot.tftp.control_failed->netboot.tftp.control.failedsystem.schema_reset->system.schema.reset
Pre-1.0 break-freely (no migration / back-compat shim). Operators with custom event-log scrapers will need to grep for the dotted names.
Additional fix: catalog.cache.populated now fires for BOTH
sha-pinned AND un-sha’d fetches. Pre-fix it only fired for
un-sha’d entries because the gate was tied to the
disk_image_sha backfill UPDATE. Sha-pinned downloads had no
terminal success in the audit log – an operator scrolling
/ui/events saw .requested + .started with no closure. The
backfill UPDATE itself remains gated on entry.sha256 is None
(sha-pinned rows already carry the sha).
/ui/images UX¶
Always-render action buttons + new Update. The per-row
action column used to hide buttons whose preconditions weren’t
met. Now all four (Fetch, Update, Cache delete, Entry delete)
always render with consistent placement; each disabled when
its applicability gate is false, with a tooltip explaining
why. The operator scanning the column sees the same shape
every row.
New “Update” button: enabled iff a catalog entry has a
remote source AND a local copy exists. Click chains
DELETE /catalog/cache/{name} + POST /catalog/downloads – the
natural workflow for rolling oras tags whose upstream changed.
Relies on the new catalog.fetch lifecycle so /ui/events
clearly shows the re-pull.
Row-busy disable: when ANY worker job (download or hash) is queued/running for a row’s name, ALL buttons on that row disable. Cancellation goes through /ui/downloads or /ui/hashing where each manager exposes its dedicated Cancel button.
Hash button dropped. Every path into image_root already auto-enqueues a hash (PUT /images upload, lifespan auto-import, DownloadManager fetch). The per-row Hash button only helped a niche “ssh + scp into image_root at runtime” workflow where “restart bty-web” is the accepted answer.
Content SHA clickable to copy. The cell renders the first 8 chars + a clipboard icon; click copies the full sha256 to the clipboard via the modern Clipboard API (with a hidden- textarea fallback for non-HTTPS contexts). A brief check-icon swap confirms the copy. Pre-fix operators had to triple-click, copy, and prune the trailing ellipsis manually.
Local copy badge clearer wording: column header renamed
Cached -> Local copy, cell values cached /
available -> yes / no. Dashboard pill renamed
the same way; the separate “Local” pill (which counts
operator-uploaded entries by source-kind, distinct from
“has local bytes”) renamed to “Uploaded” to disambiguate.
Bug fix¶
DownloadManager.enqueue re-runs when the cache file is
missing. Operator-reported (v0.33.15): deleting the cached
local copy of an oras-src image, then clicking Fetch, briefly
flipped the button to “Downloading…” but no worker actually
ran – the dedup branch returned the stale “completed” state.
Now the dedup branch verifies the cache file still exists for
completed entries; if it’s gone, fall through to a fresh
enqueue.
Suite 879 -> 884.
[0.33.28] - 2026-05-26¶
Deep-pass cleanup on PXE state machine, audit log honesty, and HashManager. Seven distinct findings from a state-changes deep audit, batched into one release.
Crashed-flasher / crashed-live-env self-healing (F1)¶
The /pxe/{mac} consume of saw_flasher_boot now gates on the
completion signal, not the arm bit alone. Pre-fix, the bit alone
gated the sanboot serve. If the live env crashed between fetching
/boot/...?mac= (which arms the bit) and posting its completion
signal (/pxe/{mac}/inventory for inventory mode, /pxe/{mac}/done
for flash modes), bty-web happily sanbooted the (empty /
half-flashed) disk. Two failure modes followed:
bty-flash-alwaysandbty-inventory: one wasted sanboot cycle per crashed live env – the box couldn’t boot the disk, power-cycled, the next/pxecleared the bit, then re-served the chain. Self-recovered, but a visible operator-facing burp.bty-flash-once: TERMINALLY stuck on the half-flashed disk. The mode’s “stop after one flash” contract made the next/pxeSTILL serve sanboot of the bad disk. Required operator intervention (re-save the machine) to re-arm the flash.
Post-fix: armed && completion_signal gates the sanboot.
Armed-without-completion treats the live env as crashed and
re-serves the chain. Self-healing without operator intervention;
bty-flash-once retries until /done lands. The retry serve is
distinguishable in the audit log via netboot.pxe.offered
details: retry_after_armed_no_done: true (flash modes) or
retry_after_armed_no_post: true (inventory).
Race-safe is_new discriminator for discovery upsert (F2)¶
Pre-fix, the discovery handler used
INSERT ... ON CONFLICT DO UPDATE RETURNING *, (created_at = ?) AS is_new.
The row race itself was safe (v0.33.6), but the is_new
discriminator did a timestamp compare – which could TIE on hosts
with low-resolution clocks (some VMs, slow virtualised guests).
Two concurrent requests whose _now_iso() produced the same
string both saw is_new=1 and both logged a
machine.discovered event for the same MAC.
Post-fix: split into INSERT ... ON CONFLICT DO NOTHING RETURNING 1 + unconditional UPDATE. The RETURNING row materialises iff
the insert actually fired (DO NOTHING suppresses it on conflict)
– timestamp-independent, the canonical race-safe “did I create
the row?” signal in SQLite. The UPDATE then refreshes
last_seen_* and COALESCEs discovered_at so PUT-created
rows still backfill on first /pxe contact.
/pxe handler: 6 sqlite connections -> 2 (F3)¶
The /pxe/{mac} handler used to open up to six separate sqlite
connections per request: discovery upsert, saw_flasher_boot
clears in two policy branches, two flash-failure events, and the
always-runs pxe.offered event. Each open_db() runs schema-
init / PRAGMA setup and creates an implicit transaction; six per
PXE hit was gratuitous on the hottest server route.
Refactored to gather any saw_flasher_boot clear via a flag set during the policy decision, then apply it alongside the offered event in one final transaction. Two connections per request.
machine.discovered event payload symmetry (F4)¶
The audit log’s machine.discovered event used to land without
a details payload. Sibling events (machine.created /
machine.upserted) carry a five-key payload (bty_image_ref,
boot_mode, sanboot_drive, hostname, target_disk_serial); an
operator pivoting on a MAC across the audit log saw a missing-keys
surprise on the discovery row. Now: the discovery emit (both
/pxe/{mac} and /pxe/{mac}/plan) carries the same five
keys. At discovery time only boot_mode has a value (the
auto-default bty-inventory); the rest are explicitly NULL.
Drop redundant flash-failure events (F5)¶
netboot.pxe.flash.orphan_ref and netboot.pxe.flash.no_target_disk
used to land as standalone events. But the always-runs
netboot.pxe.offered event already carries reason: orphan_ref / reason: no_target_disk in its details payload
– the standalone events were duplicates. Dropped both from
KNOWN_EVENT_KINDS; operators tracking flash failures should
pivot on netboot.pxe.offered events where details.reason
is set.
HashManager: skip catalog cache files on startup (F12)¶
The lifespan walks BTY_IMAGE_ROOT and queues a hash job for
every file without a .sha256 sidecar. Catalog-fetched cache
files (catalog-<ref:12>-<slug>.<ext>) are special: the
DownloadManager computes the sha while bytes flow during fetch
and writes it straight to catalog_entries.disk_image_sha (no
sidecar). They were getting re-hashed on every startup, wasting
I/O and (on a Pi-class box with multi-GiB images) blocking the
operator binding flow behind a redundant queue. Now skipped:
is_catalog_cache_filename gates the enqueue.
HashManager: backfill catalog row on manual hash (F13)¶
When an operator manually triggers a hash of a catalog-cache file
(POST /catalog/hashes/<name>), the HashManager terminal
callback’s previous UPDATE matched WHERE src = 'file://<name>'
– which wouldn’t find the owning catalog row (its src is the
upstream URL, not file://catalog-...). A second UPDATE
matches by the 12-hex bty_image_ref prefix encoded in the
cache filename when the first one returns rowcount=0. Lands the
sha on the right row even when the row’s src has nothing to do
with the local cache filename. New helper
bty.catalog.ref_prefix_from_cache_filename, mirror of
local_filename_for.
/pxe/{mac}/inventory heartbeat (F6)¶
The inventory POST used to update known_disks_at without
touching last_seen_at / last_seen_ip. A machine in
bty-inventory mode that POSTed inventory and then sat at the
wizard showed a stale last seen X minutes ago on
/ui/machines, even though the live env was clearly alive minutes
ago. Now the UPDATE refreshes the heartbeat alongside the
completion signal.
/boot/{name} heartbeat on every fetch (F7)¶
The /boot/{name}?mac= arm path’s UPDATE was gated on the
0->1 transition (which is correct for saw_flasher_boot) but
that also meant idempotent re-arms (kernel + initrd + squashfs
in one boot) didn’t touch last_seen_at. Same for machines in
ipxe-exit / bty-tui mode whose policy filter blocks the
bit UPDATE entirely – their /boot fetches were heartbeat-
invisible. Now split into an unconditional last_seen
UPDATE + the existing bit-gated transition UPDATE so every
fetch refreshes the operator’s view.
Operator rebind clears completion signals (F8)¶
PUT /machines/{mac} (and the matching UI form upsert) already
reset saw_flasher_boot on policy-affecting changes
(boot_mode, bty_image_ref, target_disk_serial) via a CASE WHEN
expression. The completion signals (last_flashed_at,
known_disks_at) used to survive the same edit, which re-opened
the failure mode F1 closed: stale last_flashed_at + a future
crashed flasher cycle = the /pxe consume gate saw armed=True AND
has_flashed=True (from the OLD cycle) and sanbooted a
half-flashed disk. Now the same CASE WHEN clears both completion
signals on policy change; hostname / sanboot_drive remain
cosmetic edits that preserve the signals.
Audit events for orphan /done and /inventory (F9)¶
A live env POSTing /done or /inventory for a MAC bty-web has no
row for (operator deleted mid-cycle, MAC from a foreign live env,
direct endpoint poke) used to 404 silently with no audit trail.
Now both 404 paths log a pxe.client.orphan event with
details.signal in {"done", "inventory"} so the operator
can correlate “this MAC tried to report; we have no row” on
/ui/events.
Round 3 (uncovered ground)¶
DownloadManager backfill keys on src not name (F10)¶
The catalog.cache.populated backfill UPDATE matched by
WHERE name = ?. name is a free-text display label with
NO UNIQUE constraint in catalog_entries – two operator-curated
rows for different upstream URLs that happened to share a
display name (debian.iso from different mirrors) would BOTH
have their disk_image_sha clobbered by a single completed
fetch. Now keyed on src (the immutable source URL the
CatalogEntry was built from). Regression test pins the
name-collision scenario.
settings.upstream.updated captures old + new (F11)¶
The settings.upstream.updated event used to record only the
post-change values (in the summary string, not details). An
operator auditing “what was the catalog URL before?” or a
drift-tracking script comparing successive events had no
before/after visibility. Now the event’s details dict carries
{release_repo, catalog_url, release_tag} -> {old, new}
for every save.
Lifespan teardown order: drain workers before closing the bus (F14)¶
The lifespan finally block used to call event_bus.close()
FIRST, then await stop() on each manager. Final worker
state-changes (a hash that completes 100ms before SIGTERM)
saw loop.is_running() still True and
call_soon_threadsafe succeeded, but the loop was already
past the point where SSE subscribers would drain it – the
event got dropped silently. Reordered: stop the four managers
(download / hash / release-fetch / backup) FIRST, THEN close
the bus. The backup scheduler task still wakes first via its
event so the loop body exits before SIGKILL window.
Suite 861 -> 879.
[0.33.23] - 2026-05-26¶
Audit-log event for the saw_flasher_boot 0->1 transition. The last silent state transition gets an event.
Pre-fix, every state change in the machine lifecycle landed an
audit-log event EXCEPT the /boot/{name}?mac=X arm of
saw_flasher_boot. Operators following a machine’s timeline on
/ui/events had to correlate raw /boot artifact fetches (not
machine events) with the next /pxe contact’s offer_kind to
deduce when the live env actually ran.
/boot/{name}?mac=X now logs netboot.flasher.armed – ONLY on
the 0->1 transition. The UPDATE WHERE clause restricts the write
to saw_flasher_boot = 0 so idempotent re-arms (kernel + initrd
squashfs all hit the route in one live-env boot) are no-ops on the bit AND don’t spam the event log. Combined with the v0.33.22 state-label honesty fix, the timeline now reads:
machine.discovered(first /pxe contact)netboot.pxe.offered(per /pxe hit, with offer_kind)netboot.flasher.armed(live env booted into the box)machine.inventory(live env POSTed disks) ORmachine.flashed(live env POSTed /done)netboot.pxe.offered(next /pxe, served sanboot)
Two new regression tests in test_web.py:
one /pxe arm cycle that hits /boot three times must emit exactly ONE armed event (not three)
machines in
ipxe-exit/bty-tuimodes that don’t consume the bit MUST NOT log an armed event (the arm WHERE clause skips them)
Suite 859 -> 861.
[0.33.22] - 2026-05-26¶
Two state-machine fixes operator-pointed in.
State-label honesty (the operator’s report)¶
Pre-fix: bty-inventory machines showed “inventoried; booting
disk” the moment saw_flasher_boot flipped to 1 – i.e. when
the iPXE chain pulled /boot/kernel?mac=X, BEFORE the live env
had a chance to run bty or POST /pxe/{mac}/inventory. The
label lied for the seconds-to-minutes between iPXE chainload
and the actual inventory POST.
Same bug shape for flash modes: “flashed; booting disk” fired
on the iPXE arm, not on /pxe/{mac}/done.
bty.web._app._boot_state() now requires the matching
COMPLETION signal:
bty-inventory + armed +
known_disks_at IS NOT NULL->inventoried; booting diskbty-flash-* + armed +
last_flashed_at IS NOT NULL->flashed; booting diskarmed but no completion signal yet -> new honest label
live env running; awaiting <inventory|flash>not armed ->
pending <flash|inventory>/ready to flash
Selective saw_flasher_boot reset on upsert¶
Pre-fix: ANY PUT /machines/{mac} (or the UI form upsert) reset
saw_flasher_boot to 0 – including pure-cosmetic changes like
hostname or sanboot_drive. An operator renaming a box mid-flash
silently interrupted the in-flight cycle; next /pxe served the
flash chain instead of the post-flash sanboot.
Post-fix: the reset is gated by a CASE that fires only when
a CYCLE-INVALIDATING field changes:
boot_mode(intent changed)bty_image_ref(bound image changed; sanbooting the disk holding the OLD image is wrong)target_disk_serial(target changed; sanbooting an unflashed-by-this-cycle disk is wrong)
hostname and sanboot_drive are display / boot modifiers that
don’t invalidate the cycle. The same fix applies to the JSON
PUT /machines/{mac} and the UI form POST /ui/machines/{mac}
paths – both must stay in lockstep.
Tests¶
Five new tests in test_web.py covering the upsert matrix
(boot_mode change, ref change, target change all reset; hostname
change + sanboot_drive change preserve), plus two new tests in
test_web_ui.py pinning the three-state label transitions
(pending -> live env running -> done) for both inventory and
flash modes.
Suite 854 -> 859.
[0.33.21] - 2026-05-26¶
Add storage-marker assertion to the QEMU PXE chain test. The v0.33.19 storage-format marker is written by bty-web’s lifespan; unit tests verify the helper, but no QEMU-level test confirmed the marker actually lands in a real server VM after a real boot.
The cijoe pxe_run_chain_test.py now SSHes back into the server
VM after the chain succeeds, reads
/var/lib/bty/images/.bty-storage.json, and asserts
format_version == 1. A regression in the lifespan write (skipped
on certain paths, written to the wrong directory, malformed JSON)
fails the PXE chain test loudly rather than just the unit tests –
the operator-visible failure mode is “appliance boots but rejects
its own image_root on the NEXT restart”, which is exactly the
class of thing only a real-boot test catches.
The expected version (1) is hardcoded in the cijoe script; if the
on-disk layout actually changes the operator-bumps
STORAGE_FORMAT_VERSION in bty.catalog AND the literal in the
PXE test in lockstep.
[0.33.20] - 2026-05-26¶
bty-state-init tool. Sibling of bty-state-migrate: wipes,
partitions (GPT), formats (ext4, label BTY_IMAGE_STORE), and
mounts a target disk at /var/lib/bty. Differs from migrate by
NOT copying the existing state – it’s for starting fresh.
Why two tools¶
bty-state-migrate /dev/sdX– relocate the current state to a separate disk. Copies/var/lib/btycontents onto the new disk before mount.bty-state-init /dev/sdX– prepare a fresh disk, discard any existing state. Leaves the mount empty; bty-web’s first-start lifespan stampsstate.db+ writesimages/.bty-storage.jsonso the populate path lives in one place (the v0.33.19 storage marker logic).
Safety rails (same as migrate)¶
refuses to format the rootfs disk
refuses a device with currently-mounted partitions
refuses when
/var/lib/btyis already a separate mount (bty-state-initis more conservative than migrate’s no-op: unmount first if you really want a reset)confirmation prompt unless
--yes
Tests + docs¶
tests/test_state_init.py– subprocess smoke tests for the argument parser + non-destructive validation rails (help, no-args, unknown-flag, non-root / non-block-device)docs/src/walkthrough-image-store.mdcarries a new “Starting from scratch with a fresh disk” section explaining when to use which tool
Suite 850 -> 854.
[0.33.19] - 2026-05-26¶
Storage format version + inventory-format decoupling. Operator-
pointed: the on-disk layout needs a version number independent of
bty.__version__, and unconventional files in image_root should
warn loudly so the operator notices.
Added¶
bty.catalog.STORAGE_FORMAT_VERSION= 1. Constant naming the on-disk layout / filename grammar version. Independent of the bty package version; bumped ONLY when the convention changes (filename pattern, sidecar shape, etc.). v1 covers the v0.31.0+ scheme (catalog-<ref:12>-<slug>.<ext>for cache files, operator-typed for the rest,.sha256sidecars,.partialupload-in-progress, mid-fetch tempfiles)..bty-storage.jsonmarker. Written intoimage_rooton first bty-web start; carries{format_version, created_at, created_by_bty_version}. Read on every subsequent start.bty.catalog.check_or_write_storage_marker(image_root). Idempotent on matching version. On mismatch raisesStorageFormatMismatchwith operator-facing remediation text (drop to a shell with Alt+F2, archive image_root, restart). On malformed JSON (interrupted write, etc.) same error.bty.catalog.is_recognised_image_store_filename(name). Predicate listing every legal name shape. Used by the lifespan startup to scan image_root + log warnings for unconventional files (operator notes, half-downloaded tools, stray scripts) so they’re visible rather than silently ignored.
Clarified¶
bty_export_versionin inventory.json is independent ofbty.__version__. Documented in_portability.py: the number bumps ONLY when theinventory.jsonshape changes, NOT on every bty release. A v3 bundle written by bty v0.33.2 must remain importable by any future bty release that still understands v3. Pre-1.0 policy: bundles don’t migrate across major-format bumps; import refuses any version != current.
Tests¶
Six new in tests/test_catalog.py:
marker written on fresh image_root
marker idempotent on matching version (doesn’t drift created_at)
StorageFormatMismatch on stamp mismatch (+ remediation text)
StorageFormatMismatch on malformed JSON
recognised-filename predicate accepts documented shapes
recognised-filename predicate rejects operator-droppings
Suite 844 -> 850.
Deferred¶
bty-state-init CLI tool (operator-typed: wipe + format + populate
a fresh state disk) is bigger; queued for a follow-up round once
the storage-format-version pattern proves out under hardware
testing.
[0.33.18] - 2026-05-26¶
/pxe/{mac}/plan contract tests for the operator-facing edge cases. Operator framing: most hardware bugs are QEMU-testable; the appliance contract surfaces (plan endpoint) is exactly that.
Pinned three plan-shape invariants¶
Extensionless catalog name -> URL filename synthesis. An oras catalog entry’s title is the layer annotation, typically a descriptive string with no file extension (
"nosi fedora-sysdev (x86_64, rolling)"). The live env’s bty detects format from the URL’s last segment; an extensionless URL gets “format not recognised” + flash refused. The handler synthesisesimage.<fmt>for the URL while keeping the descriptive title in the plan’snamefield. Pinned.Real filename round-trips unchanged. When the catalog name HAS a detectable extension (
demo.img.gz), the URL keeps it verbatim. The synthesis triggers ONLY on the extensionless case.Orphan-ref plan falls back to interactive. Operator deletes a catalog entry while a machine is bound to its ref. The bound machine’s
/pxe/{mac}/planMUST NOT 500 – it returnsmode=interactiveso the live env’s wizard lets the operator pick another image.
Coverage¶
Suite 841 -> 844.
[0.33.17] - 2026-05-26¶
Appliance upgrade path: integration test caught a real bug. Operator-pointed: test the path where an existing appliance gets reflashed but the state disk (state.db, images/, backups/) survives.
The bug¶
Writing the end-to-end integration test surfaced a fourth contract that nothing pinned:
Old appliance ran for a while; image_root has a
catalog-<ref:12>-<slug>.<ext>cache file the old release fetcheda
.sha256sidecar the old HashManager wrote.
OS gets reflashed; the state disk survives untouched.
New bty-web starts: state.db rotates (contract #1 – pinned).
Operator re-adds the catalog entry by URL (the catalog table got rotated out with state.db).
New row’s
disk_image_shais NULL because the operator didn’t give a sha_url. The cached file is on disk; the sidecar carries its actual sha.Pre-this-fix:
merge_with_catalogpopulated theUnifiedImage.sha256fromentry.sha256(NULL) – and/imagesdefensively dropscached=True + sha=Nonerows because it can’t build the/images/<sha>/<name>URL without the sha. The cached entry vanished from/ui/images. The operator would think the upgrade lost their cached images and re-download them.
The fix¶
bty.images.merge_with_catalog pass 2 now reads the cached file’s
.sha256 sidecar when cache_hit=True and entry.sha256 is None.
The sidecar is the canonical content hash; the catalog row’s NULL
is just “we haven’t observed it via our own download path.” After
the fix, the upgraded appliance recognises every cached image the
old release left behind.
Added tests¶
test_appliance_upgrade_with_persistent_state_disk– the full E2E scenario intest_web.py. Old appliance state seeded with v0.30.0 marker + a bound machine + a cached catalog file + operator-typed image + pre-upgrade export bundle. New bty-web starts, rotates state.db, auto-imports operator-typed, refuses to auto-import catalog-prefixed (v0.33.1), import bundle restores hardware identity, operator re-adds catalog entry, /images recognises the cached file via sidecar (the bug above).test_merge_with_catalog_picks_up_sidecar_for_uncached_entry– unit-level regression test intest_images.pypinning the sidecar-read invariant so a refactor can’t silently regress.
Operator impact¶
Upgrade the appliance + reflash the OS disk. After bty-web starts,
re-add catalog entries by URL (no sha_url needed). The previously-
cached images surface as cached=True in /ui/images because
their sidecars carry the canonical hash. No re-downloads.
[0.33.16] - 2026-05-26¶
Full reflash-cycle state-machine tests. Operator-pointed: “how about the challenging re-flash-application-path?”. This IS the reason bty exists, and only one slice (operator-upsert resets the bit) had explicit coverage. v0.30.2 fixed a real “flash-once behaved like flash-always” bug; nothing pinned that contract.
Added (6 tests)¶
End-to-end state-machine coverage in tests/test_web.py:
test_reflash_lifecycle_bty_flash_always_alternates– full cycle: PXE serves flash chain (bit=0) ->/boot/{name}?mac=arms bit -> next PXE serves sanboot AND clears bit -> next PXE serves flash chain again. The clear-on-consume is what makes the alternation work; the bit re-arms on the next /boot fetch.test_reflash_lifecycle_bty_flash_once_is_terminal– the v0.30.2 regression test: after one flash + /boot arm, every subsequent /pxe must serve sanboot WITHOUT clearing the bit. Terminal state. Operator re-saves the machine to start a fresh flash cycle.test_pxe_done_does_not_mutate_boot_mode– the v0.25 mode/state contract:/pxe/{mac}/doneupdates last_flashed_atlogs
machine.flashedbut MUST NOT mutate boot_mode (mode is intent; state is the bit).
test_boot_fetch_does_not_arm_sanboot_machine– the WHERE clause in_arm_flasher_bootconfines arming to flash + inventory policies. A stray/boot?mac=for an ipxe-exit machine MUST NOT leak the bit (would surprise the operator if they later switched policy).test_boot_fetch_arms_bty_inventory– inventory mode also consumes the bit (boot live env, post inventory, sanboot, recycle). Pin that/boot?mac=does arm for bty-inventory.test_reflash_lifecycle_pxe_offered_event_per_iteration– two full cycles produce exactly 4netboot.pxe.offeredevents in the audit log with alternating offer_kind. The operator’s visibility into “did the box come back?” is the audit timeline; a refactor that drops events on the sanboot branch is caught.
Coverage¶
Suite 833 -> 839. The reflash state machine – the actual appliance contract – is now pinned end-to-end.
[0.33.15] - 2026-05-26¶
Edge-case audit + targeted 4xx-path tests. Operator-pointed question: are edge cases and variations tested? Honest audit of all 72 (method, path) pairs:
71/72 had at least one status-code assertion
41/72 had at least one 4xx error path tested
29/72 had ONLY success-path coverage (some legitimately have no 4xx –
/healthz,/version, static UI renders behind 303 auth redirects)
Closed the realistic operator-triggerable error paths in this batch:
POST /workers/backupsinvalid trigger -> 422 (was untested; Pydantic enum rejection now pinned)HEAD /boot/{name}missing artifact -> 404 with empty body (UEFI HTTP-Boot firmware HEADs before GET; a 500 here would break boot order fallback)HEAD /images/{key}missing -> 404 (same UEFI HEAD-probe story)DELETE /catalog/entriesunknown src -> 404 (stale UI tab clicking delete on a deleted row gets a clean error, not 500)DELETE /catalog/entriesno auth -> 401
Suite 828 -> 833.
What’s still uncovered¶
Routes with no 4xx test left (less realistic / lower risk):
Most read-only GETs that auth-redirect on 401 already (these have implicit error coverage via the auth gate)
/healthz,/version,/pxe-bootstrap.ipxe– legitimately success-onlyPOST /ui/settings/{upstream,backup,tftp-control}form-validation edge cases (invalid retention, bogus tag)POST /ui/catalog/entrieshttp(s) branch edge cases (empty image_url, malformed sha_url)
These are real but lower-blast-radius than the ones closed here. Continue tackling per concrete operator hits.
[0.33.14] - 2026-05-26¶
UI catalog-entry add: oras:// branch tested. The JSON
POST /catalog/entries oras path is tested; the parallel Form
endpoint at POST /ui/catalog/entries had only its http(s)
branch covered. The oras branch (lines 913-975 of _ui.py –
~60 lines) was previously the largest uncovered block.
Three new tests in test_web_ui.py:
happy path: oras URL resolves via
oras.resolve_ref, row inserts with digest / name / format / size_bytes from the manifest, 303 to/ui/imagesresolve failure:
OrasErrorfromresolve_refredirects with?error=oras+resolve+failedrather than 500-ingduplicate src: re-submitting the same oras URL hits
UNIQUE(src)and redirects with?error=already+exists
Coverage¶
_ui.py91% -> 94%Total suite: 825 -> 828 tests
Overall: 91% -> 92%
[0.33.13] - 2026-05-26¶
Backup manager helper coverage. Two small utility-function gaps closed:
_resolve_max_parallel: tests forBTY_BACKUP_MAX_PARALLELenv-var parsing (numeric, out-of-range, non-numeric fallback, unset). Same shape as the v0.33.10 hash-manager addition – three managers all read their own env var; operators who set one might typo a similar one._suppress_oserror: tests that the context manager swallowsOSErrorsubclasses (used around best-effort rmtree cleanup), propagates non-OSError exceptions unchanged, and exits cleanly on success. Pinned semantics so a future “simpler” rewrite that also swallows unrelated bugs would fail loudly.
Suite 823 -> 825.
[0.33.12] - 2026-05-26¶
Operator-facing edge cases get explicit tests.
_verify_sha256_manifest error paths (5 tests)¶
The sha256-manifest verifier runs after every netboot fetch. The
success path was tested via test_fetch_release_round_trips, but
each operator-facing failure mode had no dedicated test:
malformed manifest line (corrupted upstream / partial CDN upload)
referenced file missing from files_dir
self-reference + blank lines skipped (some sha256sum invocations emit these)
*/./filename prefix stripped (binary-mode and operator-edited manifests)empty manifest rejected (an empty file would otherwise “pass” verification silently)
/pxe/{mac} flash-mode-without-ref branch (1 test)¶
PUT /machines/{mac} accepts boot_mode=bty-flash-always without
a bty_image_ref bound – the machine is in a “policy picked but
image not yet selected” state. The PXE handler’s response to this
state landed in the ipxe_unknown.j2 fallback branch, which had
no test. Now pinned: the audit event records offer_kind="unknown".
Coverage¶
_releases.py83% -> ~95% (rough)Total suite: 817 -> 823 tests
Overall: 90% -> 91%
[0.33.11] - 2026-05-26¶
Two integration-level tests for behaviors that had no end-to-end coverage.
Schema-rotation runs on real app startup¶
The init_db-level rotation tests (test_web_db.py) prove the SQL
primitive works. But nothing pinned that create_app actually
invokes the rotation on its way up. A refactor that moved init_db
behind a flag, or skipped it during app build, would have left the
unit tests passing while real bty-web silently failed to rotate.
test_create_app_rotates_stale_state_db_end_to_end stamps a
state.db with a fake-old version, builds the app via create_app,
hits /healthz, and asserts:
the
state.db.<oldver>.<ts>.bakfile exists alongside the fresh DBthe fresh DB carries
bty.__version__and has no leftover rowsexactly one
system.schema_resetevent landedthe
.bakstill contains the pre-rotation operator row
DownloadManager: vanished-catalog-entry¶
_catalog.py had no test for the “operator deleted the catalog
entry mid-fetch” branch. Pre-this-round the worker would have
silently AttributeError’d on entry.sha256 if _lookup_entry
returned None at the worker’s second lookup. The branch has a
proper handler (status=failed with “catalog entry vanished”)
but no test pinned it.
test_run_handles_catalog_entry_vanished_mid_download enqueues a
job, monkey-patches _lookup_entry to return None after the
enqueue-time lookup, asserts the state lands at
status=failed with error="catalog entry vanished".
Coverage¶
_catalog.py81% -> 85%Total suite: 815 -> 817 tests
Overall: 88% -> 90%
[0.33.10] - 2026-05-26¶
Continue closing test gaps. Two specific holes:
/workers/backups HTTP layer¶
The BackupManager has direct tests, but the three HTTP routes that
wrap it (GET / POST / DELETE /workers/backups) had zero
end-to-end coverage. The /ui/backups page polls these three
endpoints; if the HTTP layer drifted, the UI would surface a
confusing 500 / 422 instead of the JSON shape the JS expects.
Added 5 tests in test_web.py:
test_workers_backups_get_requires_auth– unauth -> 401test_workers_backups_get_empty_returns_stable_shape– fresh fixture returns parseable envelopetest_workers_backups_post_runs_to_completion– POST enqueue -> 202; follow-up GET reflects the job; backup terminatestest_workers_backups_delete_unknown_returns_404– cancel against unknown id -> 404, not 500test_workers_backups_delete_requires_auth– cancel needs cookie
HashManager enqueue / cancel / parallelism¶
_hash.py was at 89%. The before-start error, env-var-driven
parallelism resolution, and explicit HashCancelled cancel path
had no tests. Added 3 tests in test_web_hash_manager.py:
test_enqueue_before_start_raises– RuntimeError if not startedtest_resolve_max_parallel_env_var–BTY_HASH_MAX_PARALLELparsing, including out-of-range and non-numeric fallbacktest_enqueue_explicit_hash_cancelled_lands_cancelled– the worker raisingHashCancelleddirectly flips status=cancelled (vs the cancel-during-IO-error race already covered)
Coverage¶
_hash.py89% -> 96%_app.py90% -> 91% (workers/backups + side-effect coverage)Total suite: 807 -> 815 tests
[0.33.9] - 2026-05-26¶
Targeted test coverage for the worst gap. Operator-pointed critique: every API endpoint / function / UX experience must be tested, otherwise it’s broken. The three real bugs found in the v0.33.x arc (PXE INSERT race, netboot fetch URL, session secret) all lived in code paths that lacked dedicated tests.
A coverage run pytest pass showed _release_mgr at 66% – the
worst gap. The manager wraps _releases.fetch_release (where
v0.33.7’s bug lived) in an asyncio worker pool. The 66% was
incidental coverage from /ui/netboot integration tests that
don’t reach the failed / cancelled / dedup / backfill branches.
Added¶
tests/test_web_release_mgr.py– 14 dedicated tests covering:enqueue input validation (path-traversal tag, before-start error)
happy path: queued -> running -> completed with per-artifact state propagation + terminal audit event
same-tag dedup while running (operator double-clicks “Fetch artifacts”)
FetchError -> failed with error preserved + audit event
unexpected exception -> failed with typed prefix
FetchCancelled -> cancelled with NO audit event (operator- initiated, not a failure)
cancel-vs-IO-error race: cancel flag set + FetchError -> cancelled (the manager re-classifies)
backfill from completed event
backfill from failed event
backfill dedups per-tag to newest verdict
backfill soft-fails on corrupt DB
ReleaseArtifactState.to_dictshape pinReleaseFetchState.to_dictincludes artifacts sub-array
Coverage¶
_release_mgr 66% -> 93%. Total suite 793 -> 807 tests.
Acknowledgement of scope¶
This closes one gap. The remaining 12 percentage points (88% ->
100%) is multi-day work and would require coverage on async SSE
streams, lifespan teardown, and several rare error branches in
_app.py. Future rounds tackle specific gaps as they get
correlated with operator-reported failures.
[0.33.8] - 2026-05-26¶
Harden session-secret resolution against empty / corrupt values. Third real bug this evening, found by reasoning rather than survey.
The bug¶
_resolve_secret_key in bty.web.__init__ silently passed an empty
string through to SessionMiddleware when:
BTY_SESSION_SECRET=""was set in env (operator “clearing” the override by setting it to empty)the on-disk
session-secretfile existed but was empty / pure whitespace (e.g. a half-written file from a crashed first boot –Path.write_textisn’t atomic, so aSIGKILLbetween open and write left an empty truncated file that the next start loaded as the HMAC key)an operator
touch-ed the file expecting bty-web to populate it
SessionMiddleware happily accepts secret_key="" and signs every
session cookie with a predictable HMAC. On a LAN segment with even
one curious user, forging an admin session cookie is trivial.
The fix¶
_resolve_secret_key now treats any empty / whitespace value from
either env or file as “not set” and falls through to the
generate-and-persist path. Generation uses a same-dir tempfile +
Path.replace (atomic rename) so a crash mid-write either leaves
the previous file intact or no file at all – never a truncated /
empty one. The empty file (if present) gets overwritten by the
rename.
Tests¶
Five regression tests in test_smoke.py:
test_resolve_secret_key_rejects_empty_envtest_resolve_secret_key_rejects_whitespace_envtest_resolve_secret_key_rejects_empty_filetest_resolve_secret_key_rejects_whitespace_filetest_resolve_secret_key_persist_is_atomic– no.tmpdebris after a successful generate
Operator impact¶
If your appliance has an empty session-secret file, the next
bty-web start regenerates one. All existing session cookies
invalidate – operators re-log in once, with bty-web now signing
cookies under an actual key. Subsequent restarts reuse the
persisted key as before.
[0.33.7] - 2026-05-25¶
Fix the netboot fetch button when the operator clicks “Fetch netboot artifacts”. Second real hardware-reported bug this evening.
The bug¶
Operator running v0.33.4 clicked the fetch button and got:
boot release 'latest' fetch failed: GET https://github.com/safl/bty/releases/latest/download/bty-netboot-x86_64-v0.33.4.vmlinuz returned HTTP 404 Not Found
The asset filename embeds the running bty-web’s version
(intentional: multiple versions need to coexist in
BTY_BOOT_DIR during an upgrade, and the iPXE template refs
the matching version). GitHub’s /releases/latest/download/
redirects to whatever release is current – in this case v0.33.6,
whose asset list contains ...v0.33.6.vmlinuz, not
...v0.33.4.vmlinuz. The redirect target has the wrong asset
names; every fetch 404’d.
The fix¶
tag="latest" (the UI form’s default) is now normalised to
v<bty.__version__> inside fetch_release. The
releases/latest/download/... URL form is dropped entirely –
it never worked given version-pinned asset names. Operators
running v0.33.X always pull from the v0.33.X release’s tag,
which is the only release whose asset names this server can use.
Tests¶
Two regression tests in test_web_releases.py:
test_fetch_release_normalises_latest_to_running_version– the end-to-end success path: passtag="latest", assert the trio lands.test_fetch_release_url_construction_pins_to_running_version– invariant pin: a future refactor can’t reintroduce the broken/releases/latest/download/...URL form.
Operator impact¶
Click the “Fetch netboot artifacts” button. It works now.
[0.33.6] - 2026-05-25¶
Fix INSERT race in PXE auto-discovery. This is the kind of bug that bites on hardware and is hard to reproduce – exactly what the operator-pointed-out shortcoming of the prior polish rounds.
The bug¶
/pxe/{mac} and /pxe/{mac}/plan did:
row = SELECT * FROM machines WHERE mac = ?
if row is None:
INSERT INTO machines (mac, ...) VALUES (?, ...)
log machine.discovered
commit
bty-web runs the handler in a thread pool (sync def, not
async def). Two PXE requests for the same fresh MAC arriving
nearly simultaneously – iPXE retry, dnsmasq retransmit, BMC
twitching – both passed the SELECT with row=None. Both fired
the INSERT. The second one hit UNIQUE(mac) on the PK and
raised sqlite3.IntegrityError, FastAPI returned 500. iPXE’s own
retry would succeed on the next attempt, so the operator saw
intermittent 500s in the journal without a reliable repro path.
The fix¶
/pxe/{mac} and /pxe/{mac}/plan now do an atomic
INSERT ... ON CONFLICT(mac) DO UPDATE ... RETURNING ..., (created_at = ?) AS is_new.
The upsert is idempotent under contention; the is_new flag from
RETURNING tells the handler whether to log the machine.discovered
event. _now_iso() is microsecond-precise, so the
created_at-equality check distinguishes inserts from updates
reliably (two real PXE arrivals don’t collide on the microsecond).
Tests¶
Three regression tests in tests/test_web.py:
test_pxe_concurrent_discovery_no_race– N parallel/pxe/{mac}hits viaThreadPoolExecutor, asserts all 200 and exactly onemachine.discoveredevent.test_pxe_plan_concurrent_discovery_no_race– same shape for/pxe/{mac}/plan.test_pxe_discovery_returning_clause_is_race_safe_under_direct_sqlite_repro– pins the SQL shape against a real sqlite DB with two connections, so a future “simplify the upsert” refactor can’t silently regress to plain INSERT.
Operator impact¶
If you’re running bty-web on hardware with PXE-retry-happy iPXE firmware or a flaky lab switch, the intermittent 500s on /pxe go away. No data migration needed: the SQL upsert against an existing machine row is a no-op no-op (last_seen_at + last_seen_ip update, discovered_at preserved via COALESCE).
[0.33.5] - 2026-05-25¶
Round 5: error-message hygiene + CLI help drift.
Fixed¶
404 detail leaked
reprquotes.POST /catalog/downloadswith a missing name returned a 404 whosedetailfield was"'no catalog entry named ...'"– with literal single quotes, becausestr(KeyError("msg"))is"'msg'"(KeyError appliesreprto its arg, unlike other built-in exceptions). The handler now readsexc.args[0]so the operator-visible detail is plain text. Regression test pinned intest_web.py.
Changed (CLI help text)¶
bty-web export --helppreviously claimed “write machines + catalog + image files to a bundle directory” – post-v0.33.2 metadata-only, the bundle holds onlyinventory.jsonwith per-machine hw identity. Updated help to “write a metadata-only inventory bundle (mac + lshw + known_disks)”; thedestargument help now says “bundle directory to create (holds inventory.json)”.bty-web import --helpupdated to “load an inventory bundle” (was “load a bundle directory”).Subparser preamble comment in
bty.web.__init__._run_portabilityupdated to match: no longer claims to move catalog or image files, only inventory.
Operator impact¶
bty-web export --help now describes the post-v0.33.2 reality.
404 details on the catalog-downloads enqueue path no longer carry
extra single-quotes around the message.
[0.33.4] - 2026-05-25¶
Two more rounds following Round 1+2 of v0.33.3. Different angles: operator-facing doc truth, and cross-manager consistency.
Round 3: doc-truth fixes¶
A survey of operator-visible text caught two drifts from the v0.33.x reshape:
docs/src/operations.md– the storage-classification table used to claim thebty-web exportbundle “covers exactly” the precious paths (state.db + images/). It now distinguishes the two precious classes: state.db carries via the v3 metadata bundle; image bytes do NOT travel in the bundle and move viarsync/ disk-copy / catalog re-fetch.docs/src/reference.md– the “State export / import format” section was a stub (“populated alongside the feature”). Now documents v3: shape ofinventory.json, version semantics, what import restores.
Round 4: unify byte counters to bytes_done¶
Four manager classes (Download / Hash / ReleaseFetch / Backup)
each tracked progress under a different field name:
bytes_downloaded / bytes_hashed / bytes_done /
bytes_written. The downloads.html template already carried a
shim normalising bytes_downloaded -> bytes_done so the single
progress-bar JS could render both – the giveaway that the
inconsistency was a known papercut.
All four now expose bytes_done. JSON API output for /workers/*
and /catalog/downloads / /catalog/hashes / /workers/backups
endpoints carry the same field name. Progress-callback type-alias
docstrings in bty.catalog + bty.images updated to the new
parameter name.
Removed: the JS normalisation shim in
_templates/ui/downloads.htmland the parallel pseudo-object trick in_templates/ui/hashing.html.Documented:
BackupManager._run_onenow carries an inline comment explaining why – unlike its three siblings – it does NOT pollstate._cancelin the worker loop (metadata-only export finishes in milliseconds; the Protocol shape still requires the field).
Operator impact¶
Breaking (pre-1.0 OK): if you scrape the bty-web JSON API, the
bytes_downloaded/bytes_hashed/bytes_writtenkeys are gone. Readbytes_doneinstead.
[0.33.3] - 2026-05-25¶
Two simplification rounds following v0.33.2’s metadata-only backup shape. Each round was an after-the-fact “this is now overkill” catch.
Round 1: drop the tar-stream download wrapper¶
iter_bundle_tar streamed a custom tar archive of the bundle dir
into the HTTP response via a _ChunkBuf file-like; the gymnast
made sense when bundles were multi-GiB. v3 bundles are one
inventory.json, so /ui/backups/{id}/download now serves the
file directly via FileResponse as application/json.
Content-Disposition is attachment; filename="<id>.json" so the
browser saves it with a self-describing name.
Removed:
bty.web._backup.iter_bundle_tar,_ChunkBuf, thetarfile/mypyignore hack, and the two tar-roundtrip tests.Changed: the operator’s download button now reads “.json” (was “.tar”).
Round 2: dead-code sweep¶
_dir_size->_bundle_size: was a fullos.walkof the bundle dir; v3 bundles are one file so it’s a single stat. Existing on-disk v2 bundles now report just their inventory size, which is what’s actually portable across the upgrade.Dropped v1 fallback in
_read_bundle: line previously readinventory.get("exported_by_bty_version") or inventory. get("bty_version")to be charitable to pre-v0.31.0 bundles. Pre-1.0 policy refuses v1 on import; the listing-page fallback was dead.Kept
BackupState._cancel: the field looked dead (metadata-only export finishes in milliseconds; no cancel check in_run_one), but it’s required by the_BaseAsyncManagerProtocol that the cancel API operates on. Removing it would break the shape contract for no win.
Operator impact¶
The backup download button gives you a .json you can jq (or
diff across appliances) directly. No more “unpack the tar first”
step.
[0.33.2] - 2026-05-25¶
Backups are metadata-only. No image bytes. v0.31.0 through
v0.33.1 shipped full image_root in every backup bundle, which
produced multi-GiB “backups” dominated by catalog cache files the
appliance can just re-fetch. The v3 bundle format is just
inventory.json (renamed from manifest.json – the file is a
machine inventory; “manifest” stays reserved for the catalog
manifest TOML) with the per-machine hardware identity (mac +
lshw + known_disks). A backup now fits in dozens of KiB and
finishes in milliseconds. The two JSON fields decode to native
objects/arrays (not re-encoded strings), so the file is
jq-readable as-is.
The data model the user named: backup = mac + lshw + lsblk. Import
= add the machines with that hardware attached. Image files carry
their bty_image_ref prefix in the filename
(catalog-<ref:12>-<slug>.<ext>); they associate with catalog
entries automatically when both exist on the same appliance (v0.33.1
fix). No image bytes ever travel in a backup.
Changed¶
export_bundle(state_path, dest, *, now)– dropped theimage_rootparameter; the bundle no longer copies image bytes. ReturnsExportSummary(machines, dest); thefilescount is gone.import_bundle(state_path, src, *, now)– same drop; returnsImportSummary(machines). Half-import rollback is gone because there’s nothing to roll back – the only mutation is theINSERT OR REPLACEover themachinestable._EXPORT_VERSION = 3. v1 (pre-v0.31.0) and v2 (v0.31.0.. v0.33.1, with image bytes) both refuse on import. Pre-1.0 policy: regenerate on the source release.BackupManager.start(state_path, backups_root)– dropped theimage_rootparameter; backups no longer touch image_root.BackupState/BackupOnDisk– dropped thefilesfield. The Backups page shows machine count + bytes-on-disk (bundle directory size) only.bty-web export/bty-web importCLI – same arg drop; the help text reflects metadata-only semantics./ui/backupsintro copy – now says “metadata-only … image bytes live in BTY_IMAGE_ROOT and are NOT included”.
Operator impact¶
A scheduled or “Back up now” run produces a single-file bundle whose
inventory.jsonlists the per-machine hardware identity. Tens of KiB even with hundreds of machines.Existing v2 bundles on disk still list on
/ui/backups(with blank metadata if their manifest is unparseable), butbty-web importagainst them returnsBundleVersionMismatch. Regenerate on the source release if you need a v3 bundle.After a fresh-appliance reflash +
bty-web import <bundle>, the machines re-appear withboot_mode=bty-inventoryand just their hw_lshw + known_disks. The operator re-binds image + policy. If the image-store disk survived the reflash, thecatalog-<ref:12>-<slug>.<ext>files associate with their catalog entries automatically.
[0.33.1] - 2026-05-25¶
Fix duplicate /ui/images rows when a catalog entry is cached.
v0.33.0 (and the v0.31.0+ catalog-prefix rollout before it)
auto-imported every file under BTY_IMAGE_ROOT as a synthetic
catalog_entries row with src = file://<name>. For files in
the catalog-<ref:12>-<slug>.<ext> cache form, that minted a
second row whose bty_image_ref didn’t match the upstream
catalog entry’s ref, so /ui/images rendered both the upstream
entry (“nosi fedora-sysdev (x86_64, rolling)”) and a synthetic
twin showing the raw filename as Name. The synthetic row also
carried a file: source pointing at the same on-disk file as
the upstream entry’s local source – a give-away that the merge
was treating one image as two.
Fixed¶
bty.images.merge_with_catalogpass 1 skips catalog-prefixed filenames; pass 2 picks them up via the cache-hit lookup against the real catalog entry._app._auto_import_dir_scan_rowsskips catalog-prefixed filenames; the upstream catalog entry already owns them.
Added¶
bty.catalog.is_catalog_cache_filename(name)predicate – the one place the dir-scan paths consult to recognise cache files.Two regression tests (
test_merge_with_catalog_skips_catalog_ cache_files_in_dir_scan,test_auto_import_skips_catalog_cache_ files) covering the visible-screenshot shape end-to-end.
Operator impact¶
After upgrading + restarting bty-web, /ui/images shows one row
per logical image again. Existing duplicate catalog_entries
rows from a v0.33.0 DB clear on the next state.db rotation
(or via DELETE FROM catalog_entries WHERE src LIKE 'file://catalog-%'
if you want to drop them in place).
[0.33.0] - 2026-05-25¶
Schema mismatches now auto-rotate; the recovery wizard is gone.
The v0.32.x interactive recovery flow (browser checklist, polling
JS, os._exit(0) dance, _recovery.py, recovery.html) was
overengineered for what state.db actually is. The DB holds
machine bindings (re-discovered on next PXE contact), an audit log
(cosmetic), a catalog cache index (regenerated), and a handful of
settings – all of it regenerable. Operator-irreplaceable state
lives in image files under BTY_IMAGE_ROOT, which neither code
path ever touches.
The new shape: on bty_version mismatch (or a pre-versioning DB
with no marker), _db.init_db renames state.db to
state.db.<from>.<UTC-iso>.bak, unlinks the WAL sidecars, and
creates a fresh DB stamped with the running version. A
system.schema_reset event is recorded so the dashboard tripwire
surfaces the rotation; operators acknowledge from /ui/events.
No wizard, no polling, no operator confirmation step – the
appliance just works after systemctl restart bty-web.
Removed¶
bty.web._recoverymodule (build_recovery_app + all POST handlers + the per-action exit scheduler)._templates/ui/recovery.htmlwizard template.bty.web._db.VersionMismatchError– no longer raised; schema mismatch is non-exceptional.bty.web._db.check_db/DbCheckResult/DbState– the non-mutating probe used by the recovery dispatch.15 recovery-mode integration tests (
tests/test_web_recovery.py).Recovery-mode routes documentation in operations.md + reference.md.
Added¶
bty.web._db._rotate_to_bak(state_path, from_version)– renamesstate.dbto a timestamped.bak, drops sidecars, returns the new path. Collision-safe (numeric suffix on same-second double-rotation).system.schema_resetevent kind, recorded byinit_dbwhen rotation fires.details = {from_version, to_version, archived_at}; surfaces as an unacknowledged dashboard tripwire.8 new
init_dbrotation tests (tests/test_web_db.py): pre-versioning rotation, mismatch rotation, event recording, idempotent no-op, sidecar cleanup, collision handling, forensics-preservation,.bak-untouched-on-idempotent.
Operator impact¶
Upgrade path is no-op-on-the-operator. Update bty-web, systemctl restart, dashboard shows an unacknowledged
system.schema_resetevent the operator dismisses. Machine bindings rebuild as PXE clients re-check-in.Recovering specific rows from an old DB is
sqlite3 /var/lib/bty/state.db.<from>.<ts>.bak "SELECT ...". The.bakis a normal sqlite file;rmto discard.Hardware-inventory preservation still goes through
bty-web export(before upgrade) +bty-web import(after), same as v0.31.0+ – the slim v2 bundle format is unchanged.
[0.32.4] - 2026-05-25¶
Round 7 polish: machine-delete UX feedback + docs caught up to v0.32.0’s recovery wizard.
Changed¶
POST /ui/machines/<mac>/deletenow flashes the outcome. Previously the form silently 303’d to/ui/machineswhether or not the row existed – a stale tab clicking delete on an already-removed MAC got the same redirect as a real delete with no signal. v0.32.4 returns?deleted=<mac>on real removal (green success banner) or?missing=<mac>on no-op (yellow info banner: “was not found – already deleted, or never bound”). Banners auto-dismiss after 5s.
Documentation¶
docs/src/operations.md– upgrade section now describes the v0.32.0+ recovery wizard checklist alongside the CLI-driven alternative for headless / scripted upgrades.docs/src/reference.md– new “Recovery-mode routes (v0.32.0+)” table documenting/,/ui/recovery,/ui/recovery/status,/ui/recovery/wipe,/ui/recovery/wipe-and-import,/healthz, the catchall 503, and the error-response shape (400 / 404 / 409 / 500 / 507).
[0.32.3] - 2026-05-25¶
Round 4 of the post-v0.32.0 improvement grind: security boundary DRY-up, concurrency hardening, observability fixes.
Added¶
bty.web._security.validate_basename– shared rejector for path-traversal-y inputs (NUL,/,\\,.,.., empty). Replaces five duplicated implementations of the same rule across_catalog._reject_traversal_name,_hash._reject_traversal_name,_app.delete_catalog_cache,_recovery._wipe_and_import, and thee45f93bD5 test path. Carries alabelkwarg so 400-message text identifies which field was rejected. 21 new tests intests/test_web_security.pylock down the accept/reject matrix.
Fixed¶
Concurrent wipe race in the recovery wizard. v0.32.0’s
_schedule_exit_after_responsewould spawn TWOos._exit(0)threads if a secondPOST /ui/recovery/wipearrived during the 500ms response-flush window. v0.32.3 gates on a module-level_exit_scheduledflag so only one exit thread runs. The second request’s wipe step is already idempotent (the unlink helper soft-skips missing files), so this just closes the race on the exit side.sqlite WAL deadlock starves shutdown.
open_dbconnected without an explicit timeout. If a future WAL writer wedges (out-of-process holder, broken NFS), the lifespan teardown would hang forever waiting onconn.close(). v0.32.3 setstimeout=5.0explicitly so lock waits are bounded – matches sqlite’s stdlib default but makes the contract auditable.Silent exception swallows on DB-write failures. Three back-fill paths in
_catalog.pyand_hash.pycaughtException: passto keep the appliance up when state.db is briefly unavailable – correct behaviour, but a corrupt DB that rejects every back-fill vanished from the journal too. v0.32.3 logs the exception withlog.exception(...)so a repeating failure shows up underjournalctl -u bty-web --grep backfillwithout changing the soft-fail semantics.
[0.32.2] - 2026-05-25¶
Round 2 of the improvement grind: a startup perf fix + test coverage on the recovery-dispatch branch and bundle preflight.
Fixed¶
bty-webstartup ranimages.list_images(image_root)twice. Once inside_auto_import_dir_scan_rows, once just below for the hash-enqueue loop. On a Pi-class appliance with many image files, two full inode walks per startup add up. v0.32.2 hoists the scan into the lifespan caller, threads the result through both consumers via a new_auto_import_dir_scan_rows(scanned)parameter. Single-call invariant verified against the existing test suite (no behavior change; only the count oflist_imagesinvocations differs).
Added¶
test_wipe_and_import_rejects_bundle_with_missing_manifest: recovery wizard’s preflight refuses a bundle dir whosemanifest.jsonis absent BEFORE wiping state.db. Locks the invariant that a bad operator pick can’t leave the appliance with both wiped state AND no successful import.test_create_app_dispatches_to_recovery_when_db_is_pre_versioning: integration test assertingcreate_appreturns the recovery app (not the full app) whencheck_dbreports needs_recovery. A regression that moves the recovery dispatch belowinit_db()would now fail at test time instead of in production journals.
[0.32.1] - 2026-05-25¶
Polish pass on v0.32.0’s recovery wizard + the underlying import path. Fixes the rougher edges a real operator would hit on first use.
Fixed¶
Recovery wizard polls forever on a wedged restart. The JS side now caps polling at 60 attempts (1 minute), then renders a red error card with a
journalctl -u bty-webhint instead of spinning silently. Previously a bty-web that didn’t come back (broken state.db path, port conflict, dependency missing) left the operator staring at “Waiting for it to come back…” forever.Double-click race on the recovery POST actions. Page-level
inflightflag now blocks every action button + cancels beforeunload navigation while a wipe/import is in progress. A hard-refresh during the destructive action no longer slips a second POST through before the first finishes.Half-imported state after a copy failure.
import_bundle’s file-copy loop now tracks every destination it creates; anOSErrormid-loop unlinks the partial copies before re-raising so the operator sees image_root in its pre-import state instead of a half-loaded mess. Re-raises as an annotatedOSErrornaming the failing file count + the cleanup count. New regression test (test_import_rolls_back_partial_file_copies_on_oserror).Wizard rendered the destructive checklist against an OK DB. When the operator’s browser polled
/ui/recoverybetween the wipe response and systemd’s restart, the OLD recovery-mode process answered with the full action checklist against a now- fresh state.db. v0.32.1 detectsnot needs_recoveryin the render path and shows a “Recovery complete – redirecting” banner instead, with a 1.5s auto-redirect to/ui/dashboard.Unreadable bundle file-dirs silently showed “0 files”. The picker now flags any bundle whose
files/subdir raisesOSErroroniterdirasunreadable=True(same UI state as a missing manifest), so the operator sees the bundle disabledlabeled “unreadable manifest” instead of selecting an unloadable one.
Changed¶
POST /ui/recovery/wipe-and-importerror responses now categorise:BundleVersionMismatch-> 409, missing manifest / bundle -> 404,PermissionError-> 500 with chmod hint,OSError(disk full) -> 507 Insufficient Storage with a free-space hint, anything else -> 500 withTypeName: message. Operator sees a recoverable next step in the wizard error panel instead of a bare 500.
[0.32.0] - 2026-05-25¶
Recovery-mode UI: when bty-web boots against a state.db that
doesn’t match the running release (v0.31.x’s hard-check refusal
class), it now serves an interactive recovery wizard on the
same port instead of dying with a journal-only error. The
operator hits the appliance URL in a browser, lands on a styled
checklist, picks a recovery strategy, and bty-web executes it.
Added¶
bty.web._db.check_db(path)– non-mutating probe that returns aDbCheckResultdescribing the state (OK,FRESH,PRE_VERSIONING,MISMATCH) without raising. Replaces the “always raise, let the caller try / except” shape for the recovery flow’s needs.bty.web._recovery.build_recovery_app(...)– minimal FastAPI app served whencheck_dbreports a mismatch. Routes:GET /-> redirect to/ui/recoveryGET /ui/recovery-> the wizard pageGET /ui/recovery/status-> JSON poll (the wizard’s JS polls this to detect when bty-web has restarted into normal mode and auto-redirects)POST /ui/recovery/wipe-> unlinkstate.db(+ sqlite sidecars) and scheduleos._exit(0)so systemd’sRestart=on-failurebrings up a fresh processPOST /ui/recovery/wipe-and-importbody={"backup_id": "..."}-> wipe + import a v2 bundle frombackups_root/<backup_id>, then exitGET /healthz-> 503 with reason (so automated probers see the unhealthy state)Everything else -> 503 with a meta-refresh back to
/ui/recovery(so a bookmarked normal-mode URL doesn’t leave the operator on a JSON error page)
Recovery wizard template (
_templates/ui/recovery.html): ultra-explicit banner (“bty-web needs operator attention”), current state summary (stored version vs running version, at-risk row counts), three recovery strategy cards (wipe-and-fresh / wipe-and-import-from-backup / manual shell recipe), and an auto-progressing checklist (steps 3 + 4 tick green as the operator’s chosen action proceeds + bty-web restarts into normal mode).
Changed¶
create_appdispatches to the recovery app whencheck_dbreports a needs-recovery state, instead of lettinginit_dbraise out into the journal. The full app is built only when the DB is OK or FRESH; everything else goes through the wizard.
Why this matters¶
The v0.31.0 hard-check was correct policy (refuse to silently
mix schemas) but the UX was painful: bty-web died in a journal
loop, the operator had to ssh in and read the systemd error to
find the rm state.db recovery command. v0.32.0 keeps the
policy (no silent migration) but moves the recovery
conversation into the browser where the operator already is.
[0.31.1] - 2026-05-25¶
Critical fix for the v0.31.0 hard bty_version DB check, plus a
quality grind through the documentation + tests left stale by the
v0.31.0 cache→images merge.
Fixed¶
Hard
bty_versioncheck leaked acrosssystemd Restart=on-failureretries. v0.31.0’s_db.init_dbranconn.executescript(SCHEMA)BEFORE the refuse condition;sqlite3.executescriptissues an implicit COMMIT, so the newbty_versiontable (empty) got committed to disk even when the call then raisedVersionMismatchError. systemd retried 5s later; the second call saw the marker table existed, treated the empty row as “fresh DB, stamp it”, and silently accepted the franken-state (old machine inventory + audit log carried into v0.31.0 under a stampedbty_version=0.31.0row). Surfaced in production: an operator withbty-state-migrate(state.db on a separate disk) upgraded the appliance disk, hit the bug, and saw oldbty-flash-once+ events in the v0.31.0 UI.Fix: refuse BEFORE
executescriptruns so no mutation slips through. Same shape for the version-mismatch path – the refuse branch leaves the DB exactly as it was, so the operator’sbty-web exporton the OLD release reads consistent state. Regression test (test_init_db_refuses_pre_versioning_db_across_restart_retries) exercises three consecutiveinit_dbcalls against the same pre-versioning DB and asserts no marker table appears after the failed first call.
Changed¶
Documentation sweep for v0.31.0’s cache→images merge.
README.md,AGENTS.md,docs/src/operations.md,docs/src/walkthrough-image-store.md,docs/src/walkthrough-catalog.md,docs/src/walkthrough-usb.md,docs/src/reference.md,docs/src/dependencies.md, anddocs/src/flows.mdupdated to describe the newBTY_IMAGE_ROOT-only layout, thecatalog-<ref:12>-<slug(name)>.<ext>naming, and the hard-version-check upgrade flow.AGENTS.md’sboot_policyreferences updated to the v0.23.0boot_modevocabulary with the v0.25.0 mode/state split documented. New “Catalog file naming” section inreference.mdexplains the URL-keyed name scheme._ui.py:_row_to_dictdocstring updated – no longer cites the dropped_REQUIRED_COLUMNS/StaleSchemaErrormachinery; cites thebty_versionhard check instead.
Added¶
local_filename_foredge-case tests intests/test_catalog.py: unicode names (slug strips to ASCII), very long names (no truncation; ref-prefix still disambiguates), consecutive separator collapse, leading-dot format normalisation, format=None default, empty/all-non-ASCII name fallback toimage, idempotence on same inputs, distinct URLs producing distinct filenames. Pure-function coverage so the on-disk dedup contract is locked in.
[0.31.0] - 2026-05-25¶
BREAKING: state.db wipes on upgrade. bty-web now refuses to start
on an old DB. The cross-release path is bty-web export (slim bundle
of images + cached files + hardware inventory) then wipe + import on
the new release.
Added¶
Hard
bty_versioncheck at bty-web startup. state.db carries the exactbty.__version__that created it in a newbty_versiontable; on startup, mismatch (or a pre-versioning DB with data tables but no marker row) raisesVersionMismatchErrorwith an operator-actionable recovery message and bty-web refuses to start. Pre-1.0 policy is no migration apparatus – every release wipes state (or migrates via export/wipe/import). Replaces the soft per- columnStaleSchemaErrormachinery which let pre-versioning DBs silently survive into incompatible code paths (the v0.30.x footgun that motivated this release).
Changed¶
Cache → image_root merge. No more separate
BTY_CATALOG_CACHE_DIR//var/lib/bty/cache/. Catalog-fetched files now land under the operator’sBTY_IMAGE_ROOT(i.e./var/lib/bty/images/) with a URL-keyed name:catalog-<bty_image_ref[:12]>-<slug(name)>.<ext>. Operator-typed files keep their original filenames. One mental model, one directory, onelsto inventory everything. The catalog-prefix12-hex namespace guarantees same-URL idempotency and rules out cross-entry collisions at any plausible homelab catalog size.
Operator impact on upgrade: existing files in
/var/lib/bty/cache/are orphaned (not deleted). Re-fetch via the Downloads UI lands them at the new URL-keyed path; the operator canrm -rf /var/lib/bty/cache/to reclaim disk once they’re confident.Slim export/import format (bundle version 2).
bty-web exportnow carries:What
Why
Everything under
BTY_IMAGE_ROOT(flatfiles/subdir)Re-downloads are expensive; bytes ARE the value
Per-machine:
mac+hw_lshw+known_disks(+timestamps)Hardware inventory is expensive to re-collect via PXE
Drops (operator re-types on the destination): catalog entries, machine bindings (
boot_mode/bty_image_ref/target_disk_serial/sanboot_drive/hostname),saw_flasher_bootstate, audit log, settings, backups. Version 1 bundles (pre-v0.31.0) are not migratable – regenerate on the source release before upgrading.fetch_to_cacheno longer requiresentry.sha256. Rolling-tag ORAS entries (oras://...:latest) get a stable URL-keyed local filename and benefit from on-disk dedup the same way sha-pinned entries do. Verification still fires when a sha is pinned.
Removed¶
bty.catalog.default_cache_dir(). Callers usebty.images.default_image_root()everywhere the cache used to be.BTY_CATALOG_CACHE_DIRenv var and the “Image cache” row on the Settings page.BackupState.catalog_entriesandBackupState.images(-> singleBackupState.filesfield); same forBackupOnDisk.The per-column
_REQUIRED_COLUMNS/_ADDITIVE_COLUMNS/StaleSchemaErrormachinery inbty.web._db(superseded by the version-stamp check).
[0.30.2] - 2026-05-25¶
Fixes bty-flash-once re-flashing on every PXE boot instead of
terminating after one flash. Real operator-impact bug surfaced by
audit logs showing the same machine flashing twice within three
minutes.
Fixed¶
bty-flash-onceis now actually one-shot. The/boot/<name>?mac=arm site’s WHERE clause was missingbty-flash-once, so thesaw_flasher_bootbit never got set for that mode – which made the plan resolver’s “bit set -> sanboot the just-flashed disk” branch unreachable. Every post-flash PXE contact fell through to the flash branch and re-served the flash chain, an infinite reflash loop on the mode whose entire point is “flash once”. Fix is one-line: include'bty-flash-once'in the arm WHERE clause alongsidebty-flash-alwaysandbty-inventory. Added an e2e regression test (test_e2e_flash_once_terminates_after_first_flash) that asserts the bit stays set across multiple post-flash PXE contacts; the existingtest_e2e_boot_artifact_mac_arms_only_alternating _policiestest now covers all three bit-consuming policies + both non-armed modes (bty-tui, ipxe-exit) so this regression class can’t slip past CI again.
[0.30.1] - 2026-05-25¶
CI gap close: the release pipeline now asserts the bty wizard actually rendered on tty1 of the freshly-baked USB ISO, not just that the partition grew. No operator-facing behaviour change.
Changed¶
test-usb-grow+test-usb-ventoyassert wizard renders on tty1. Both tasks now grep/dev/vcs1(the kernel’s text-snapshot of tty1’s framebuffer) forPick an image source– a string only the rendered wizard produces. The wrapper’s pre-Richbty is starting...deliberately doesn’t match, so a bty that prints the banner then crashes fails the assertion. 60s read budget for cold-cache import chains. Catches a real-shaped regression class (failed Rich init, BtyTui constructor crash, wrapper exit before exec) that v0.27..v0.30 would have shipped undetected.
[0.30.0] - 2026-05-24¶
The “SSE polish” release. Two follow-ups to v0.29.0’s bus migration: push-driven progress counters and a small shared-JS extraction so future helper additions don’t fan out to three places.
Added¶
Throttled progress events via SSE. v0.29.0 fired SSE only on state transitions (queued -> running -> terminal), so a long- running download / hash showed a frozen byte counter until the 30s safety poll. New
_BaseAsyncManager._fire_progress(key, state)debounces per-key to at most one event per second; the catalog download / hash / release-fetch progress callbacks publish through it. Progress counters tick at ~1 Hz instead of frozen- for-30-seconds, without flooding the bus on a fast NVMe read./static/bty-utils.js– sharedwindow.btyUtils.esc(s)+window.btyUtils.fmtBytes(n)helpers. The three pages that copy-pasted these (Backups, Downloads, Hashing) now alias them locally so a future tweak (rounding precision, byte-unit labels, escape semantics) is a one-place change.
Changed¶
_layout.htmlincludes/static/bty-utils.jsalongside the existing htmx + sse vendored bundles. Every authed page seeswindow.btyUtils.
[0.29.0] - 2026-05-24¶
The “SSE everywhere” release. Worker pages stop polling every 2s and listen to push-driven Server-Sent Events instead; refresh latency drops from “up to 2 seconds” to “tens of milliseconds” and idle appliance load drops accordingly.
Added¶
Worker-events SSE stream at
GET /events/workers. Same in- process pub/sub bus that drives the machines stream, filtered to theworker-state-changedevent name. Payload is a JSON triple:{"kind": "backup|hash|download|release", "key": "<state-key>", "status": "<lifecycle>"}.State listener hook on
_BaseAsyncManager. Every observable status transition (queued -> running, queued -> cancelled in stop/cancel, running -> terminal in_run_one) fires_fire_state_change(state). Lifespan wires each manager (Backup / Hash / Download / Release) to publish to the bus.Shared
EventSourcein_layout.html. One subscription per tab; dispatchesbty-worker-state-changedCustomEvents ondocumentso each polling page taps in without opening its own connection. Abty-worker-events-connectedevent fires on successful (re)connect so pages catch up on anything missed during the disconnect window.
Changed¶
Backups / Hashing / Downloads / Netboot / Images pages migrated from
setInterval(refresh, 2000)to SSE + 30-second safety poll. The instant update path is the EventSource; the slow poll is the recovery net for a silently-dropped SSE connection (EventSource auto-reconnects on errors, so this is belt-and- braces). Each page filters bykindso it only refreshes when relevant.Navbar worker-badge poll cadence relaxed from 5s to 30s. Same SSE-fast-path, slow-poll-safety-net pattern; the navbar badges + the live indicator update on every worker transition via the shared EventSource.
Performance¶
Idle bty-web on a backupped fleet now generates ~1 HTTP request every 30 seconds per tab instead of ~3 per second across the worker pages – a ~90x drop in steady-state load.
[0.28.0] - 2026-05-24¶
The “auto-refresh + Backup card reshape” release. Three places where the operator used to have to manually reload the page now refresh on their own.
Added¶
Backups page auto-reloads on completion. When the polling loop observes a backup transitioning from active to terminal (the active-job count drops to 0), the page reloads so the on-disk listing + Recent activity cards reflect the new bundle + the new event without the operator pressing F5. Same trick as the Downloads page’s
seenCompletedKeys, but tracked as a closure- levellastActiveCount.Hashing page auto-reloads on completion. Mirrors the Backups pattern: when a hash job completes, the page reloads so the cached sha badge on /ui/images + the Recent activity card pick up the new state.
Netboot page auto-reloads when a release fetch completes. The artifact present/missing badges + the Recent events card were server-rendered and went stale until the operator hit refresh. New lightweight poller checks
/boot/releasesevery 2s; reloads when an active fetch hits a terminal state.
Changed¶
Backup schedule card reorganised. Retention + Destination + Last scheduled run move to the top of the form, above an
<hr>; the optional Enable + Cadence knobs live below. Retention applies to every successful backup (manual or scheduled) so it’s an always-relevant knob, not a property of the schedule./ui/images per-row Actions standardised to
btn-group btn- group-sm. Hash / Fetch / Cache-delete / Catalog-delete now sit flush in a Bootstrap button group, same idiom as the Backups page’s Download + Delete column. Thems-1margin spacing pattern is gone.
[0.27.0] - 2026-05-24¶
The “Backups page complete” + pre-1.0 strict-validation release.
Added¶
On-disk backup listing on
/ui/backups. A new “Backups on disk” card lists every bundle under$BTY_BACKUP_DIR– newest first – with machine / catalog / image counts pulled from each bundle’smanifest.json, total bytes-on-disk, and the bty version that produced it. Empty-state copy points the operator at the trigger / Settings cadence. A bundle with a missing or malformed manifest still lists with blank metadata so orphans are visible to the operator instead of silently hidden.Per-bundle tar download. Each on-disk row carries a
.tarbutton pointing atGET /ui/backups/{backup_id}/download, which streams an uncompressed tar viatarfile.open(mode="w|"). Members are rooted at the backup_id directory sotar -xf 2026-05-23T10-00-00Z.tarproduces a same-named folder ready forbty-web import. Multi-GiB bundles don’t materialise in memory – the per-chunk buffer holds at most one tar member at a time.Per-bundle delete. Each on-disk row carries a trash button hitting
DELETE /ui/backups/{backup_id}(with a JS confirm prompt).rmtree+ a newbackup.deletedaudit event with the snapshotted counts. The operator now has full CRUD over backups from the UI: trigger, download, delete; the schedulerretention still own automatic creation + pruning.
Retention number visible. The schedule summary on
/ui/backupsnow showsRetention: keep last N (prunes oldest on every successful backup). Retention already applied to every successful backup regardless of trigger; only the operator-facing surface was missing.
Changed¶
resolve_backup_*resolvers are strict. A non-canonical value in state.db now raisesSettingValueErrorinstead of silently coercing to a default. The Settings form already writes canonical values (“1” / “0” / known cadence / positive int), so live deployments are unaffected; only a hand-edit of state.db can trip the new path. The form handler returns 422 (not a silent 303 with the value coerced) on unknown cadence or non-numeric / sub-1 retention.?saved=upstreamreplaces?saved=1. The upstream-settings form’s success redirect uses the canonical key. Unknown?saved=Xvalues now skip the banner instead of falling back to a generic “Saved.” – a hand-crafted URL can’t echo arbitrary strings into the UI. Pre-1.0 policy: dropped the legacy"1"flash key with no back-compat shim.BTY_WEB_PORTvalidation is strict. A typo’d or out-of- range value hard-exits with status 2 + a clear error, instead of silently warning + falling back to 8080. Operators who set the env var meant it; silent fallback masks the bug until they notice the port didn’t take effect.
Fixed¶
/ui/backupsno longer hides empty state. The page previously required at least one in-flight or recent backup to look useful; fresh appliances showed an empty table with no hint of what to do next. The new on-disk card carries an explicit “click Back up now, or enable a schedule” pointer.
Removed¶
Historical “what used to be here (v0.14 - v0.17.1)” memorial block in
_sysconfig.py. Pre-1.0 doesn’t owe operators a tour of removed code; git history serves that purpose.Defensive pre-Phase-E JS branch in
downloads.html. The fallback forReleaseFetchStaterows without anartifactsarray served no live code path – the per-artifact split has been authoritative since landing.
Audit log¶
New event kind:
backup.deleted(operator-initiated removal of an on-disk bundle).
0.26.0 - 2026-05-24¶
The “workers reshape + scheduled backups + Ventoy CI” release.
Added¶
Scheduled backups (in-UI).
/ui/backupscarries a “Back up now” trigger + a schedule summary;/ui/settings#backup-scheduleexposes enable/cadence/retention. The scheduler polls every 60s so a Settings change takes effect without restart. Backups land under$BTY_BACKUP_DIR(default$BTY_STATE_DIR/backups/) as the same bundle shapebty-web exportproduces. Two new env vars:BTY_BACKUP_DIR,BTY_BACKUP_MAX_PARALLEL.Three operator-add triggers consolidated on
/ui/downloads: Fetch artifacts (netboot trio + sha256 manifest), Add image from URL (http(s):// / oras://), Upload image (local file via XHR PUT).Per-file netboot artifact downloads. A “Fetch artifacts” click enqueues four files; each shows as its own row in the Downloads list and the navbar Downloads counter ticks down per file.
Catalog-row state-aware buttons on
/ui/images. Per-row Fetch / Hash flips to “Downloading” / “Hashing” while a job is in flight; the catalog row auto-reloads when the job terminates so the cached badge + Action column update without manual refresh.Ventoy boot test in CI (
test-usb-ventoy). Installs Ventoy on a 4 GiB loop-attached disk, drops the bty .iso + a sentinel imagea catalog.toml, boots via QEMU, asserts the live env’s
bty-images-discover.servicefinds the operator drop via the Ventoy dm-mapper passthrough.
New audit event kinds:
backup.created/backup.failed/backup.pruned/settings.backup.updated.KNOWN_SUBJECT_KINDSgains"backup".Audit log subject filter on /ui/events: filterable by
subject_kind=backupfor backup history.
Changed¶
/ui/workers(merged status page) was split into three pages:/ui/downloads(active downloads + the three triggers + recent download events),/ui/hashing(active SHA-256 jobs + recent hash events),/ui/backups(active backups + Back-up-now + recent backup events). Each lights only its own navbar indicator./ui/imagesis now catalog-listing-only. The Add-image cardthe per-page job tables moved to
/ui/downloads//ui/hashingrespectively. The catalog list’s per-row Fetch / Hash buttons stay.
/ui/netbootis now inventory-only. Drops the Fetch-artifacts trigger; adds a “Fetch on the Downloads page” link.bty-images-discover.servicefinds Ventoy data partitions. Enumerates/dev/dm-*devices viablkid(in addition to lsblk) so the Ventoy linear passthrough is mountable even though the underlying/dev/sda1is held by Ventoy’s own dm-linear over the chained .iso.ReleaseFetchStateexposes per-artifact state (artifacts: dict[str, ReleaseArtifactState]) so each file in the trio + sha256 manifest can be tracked + cancelled independently.DownloadManager fallback to DB catalog entries. Operator-added rows (via the Add-image form) live in the
catalog_entriestable but NOT in the parsedcatalog.toml; the manager now looks at both. Without this, the per-row Fetch button on/ui/imagesreturned a silent 404 for any URL/oras entry added through the form.Downloads progress bar reads
bytes_downloadedfor catalog rows (in addition tobytes_donefor release artifacts). Pre-fix catalog downloads showed 0% in the UI regardless of actual progress because the JS only knew the release-artifact field name.
Removed¶
Legacy
/ui/workers,/ui/fetches,/ui/hashes,/ui/downloads(merged page + sub-pages) – replaced by the three split pages above. Pre-1.0 so no redirects.Dead
_parse_boot_orderhelper fromsrc/bty/flash.py+ its 2 unused tests.
Fixed¶
Ventoy boot under QEMU:
VTOY_SECONDARY_TIMEOUT=1auto- confirms Ventoy’s “Boot in normal mode” secondary menu; previously the test sat at the menu forever waiting for keyboard input the test harness can’t supply./dev/dm-1mount inside the live env:bty-images-discoverwas trying to mount/dev/sda1directly (held open by Ventoy’s dm-mapper); now also probes the dm passthrough.bty-usb-grow.servicerace under Ventoy:systemctl is-active --waitis the wrong primitive (returns immediately for inactive-with-queued-job); switched tois-system-running --waitin the test sync barrier.PXE chain test SSH diagnostics on
/healthztimeout: tries multiple credentials (odus.321+odus+ cfg override) before declaring auth failure, so partial cloud-init / stale cloudinit-userdata.user runs still surface the bty-web journal.BackupManager
status=completedrace: was flipped before_prune_old_backupsran (outside the lock for non-blocking), letting external observers exit poll loops mid-housekeeping. Status now flips AFTER the prune so “completed” means truly done.Docker build context loader:
cijoe/_build/+cijoe-output/cijoe-archive/(root-owned from sudo’d live-build steps) now in.dockerignore+ swept bymake docker-clean. Pre-fixmake docker-buildfailed with “error from sender: open cijoe/_build/…/credstore: permission denied”.
Documentation¶
docs/src/flows.md: events table now covers allKNOWN_EVENT_KINDS(0/0 undocumented). Subject-kind list corrected (boot→netboot+ addedbackup).docs/src/operations.md: new “Scheduled backups (UI-driven)” subsection with the on-disk layout + env vars + audit-log mapping.docs/src/reference.md: /ui/workers route block replaced with per-page descriptions for /ui/downloads / /ui/hashing / /ui/backups.docs/src/walkthrough-catalog.md: Action column description covers the new “Downloading” / “Hashing” busy state; page names match the actual URLs.