Files
archy/docs/STEP-8B-PORT-AUDIT.md

18 KiB
Raw Permalink Blame History

Step 8b Port Audit — container-specs.sh → apps/*/manifest.yml

Last updated: 2026-04-23

This audit is the scope-lock for Step 8b of docs/rust-orchestrator-migration.md. Every container currently declared in scripts/container-specs.sh:ALL_CONTAINER_SPECS must be port-faithful to apps/<id>/manifest.yml before Step 8c can delete the bash scripts.

Findings in short:

  • scripts/container-specs.sh lists 30 containers across 5 tiers.
  • apps/*/manifest.yml exists for 27 app ids, but the overlap is partial and most of the overlapping manifests are aspirational stubs written in the original design phase, never reconciled against production behavior. The image references, container names, network topology, env, and health checks disagree with what actually runs on .116 and .228.
  • Only the three UI apps (bitcoin-ui, electrs-ui, lnd-ui) plus aiui are truly ported (Step 7 scope).
  • The Rust schema (core/container/src/manifest.rs::AppManifest) is missing several fields needed for a faithful port: archy-net network selection, custom_args, entrypoint override, derived host env (e.g. HOST_MDNS), secret-file env injection, and data-dir UID/GID mapping.

Table — every spec, mapped

Legend for Status:

  • PORTED — manifest exists and matches reality (Step 7 done).
  • ⚠ STUB — apps/<id>/manifest.yml exists but disagrees with container-specs.sh (image, name, network, env, or health wrong).
  • MISSING — no manifest file on disk.
  • — N/A — intentionally out of Step 8b (optional app with no spec, or already managed by a different system).
Tier Spec name (container-specs.sh) Actual container name Image source apps// matches? Status Notes
0 archy-mempool-db archy-mempool-db $MARIADB_IMAGE mempool/ Existing manifest (if any) targets mempool combined stack, not the DB sidecar. Likely a companion of apps/mempool.
0 archy-btcpay-db archy-btcpay-db $BTCPAY_POSTGRES_IMAGE btcpay-server/ Existing manifest describes only the app container. DB is a silent companion in the current model.
0 immich_postgres immich_postgres $IMMICH_POSTGRES_IMAGE (none) Optional. No apps/immich/ dir.
0 immich_redis immich_redis $VALKEY_IMAGE (none) Optional. No apps/immich/ dir.
1 bitcoin-knots bitcoin-knots $BITCOIN_KNOTS_IMAGE bitcoin-core/ apps/bitcoin-core/manifest.yml references bitcoin/bitcoin:28.4; production runs Bitcoin Knots at $ARCHY_REGISTRY/bitcoin-knots:latest. App id mismatch: spec is bitcoin-knots, manifest is bitcoin-core. Decide: rename spec or rename app id.
1 electrumx electrumx $ELECTRUMX_IMAGE (none) Separate from electrs-ui. No apps/electrumx/ dir.
2 lnd lnd $LND_IMAGE lnd/ Manifest exists; needs verification against current env/ports/caps.
2 mempool-api mempool-api $MEMPOOL_BACKEND_IMAGE mempool/ Companion of apps/mempool. May need dedicated manifest or stack-form.
2 archy-mempool-web archy-mempool-web $MEMPOOL_WEB_IMAGE mempool/ Companion.
2 archy-nbxplorer archy-nbxplorer $NBXPLORER_IMAGE btcpay-server/ Companion of BTCPay.
2 btcpay-server btcpay-server $BTCPAY_IMAGE btcpay-server/ Stub; env, ports, deps need reconciliation.
2 fedimint fedimint $FEDIMINT_IMAGE fedimint/ This is the bug from yesterday. Stub references wrong image (fedimint/fedimintd:v0.10.0 instead of $ARCHY_REGISTRY/fedimintd:v0.10.0), wrong RPC target (bitcoin-core:8332 instead of bitcoin-knots:8332), missing HOST_MDNS env, missing archy-net, missing FM_BIND_P2P/FM_BIND_API, missing gateway ports etc.
2 fedimint-gateway fedimint-gateway $FEDIMINT_GATEWAY_IMAGE (none) No manifest. Has complex LND-aware entrypoint in container-specs.sh:load_spec_fedimint-gateway.
2 immich_server immich_server $IMMICH_SERVER_IMAGE (none) Optional.
3 homeassistant homeassistant $HOMEASSISTANT_IMAGE home-assistant/ id mismatch: homeassistant vs home-assistant.
3 grafana grafana $GRAFANA_IMAGE grafana/ Stub.
3 uptime-kuma uptime-kuma $UPTIME_KUMA_IMAGE (none) Optional.
3 jellyfin jellyfin $JELLYFIN_IMAGE (none) Optional.
3 photoprism photoprism $PHOTOPRISM_IMAGE (none) Optional.
3 vaultwarden vaultwarden $VAULTWARDEN_IMAGE (none) Optional. Known-bad container on .228 (see STATUS.md).
3 nextcloud nextcloud $NEXTCLOUD_IMAGE (none) Optional.
3 searxng searxng $SEARXNG_IMAGE searxng/ Stub.
3 onlyoffice onlyoffice $ONLYOFFICE_IMAGE onlyoffice/ Stub.
3 filebrowser filebrowser $FILEBROWSER_IMAGE (none) Critical — this is Archipelago baseline (bootstrapped by first-boot), not an optional app. Lost .filebrowser.json yesterday. Must have a manifest.
3 nginx-proxy-manager nginx-proxy-manager $NPM_IMAGE (none) Optional.
3 portainer portainer $PORTAINER_IMAGE (none) Optional.
3 ollama ollama $OLLAMA_IMAGE ollama/ Stub.
4 archy-bitcoin-ui archy-bitcoin-ui localhost/bitcoin-ui:local bitcoin-ui/ Step 7 done.
4 archy-lnd-ui archy-lnd-ui localhost/lnd-ui:local lnd-ui/ Step 7 done.
4 archy-electrs-ui archy-electrs-ui localhost/electrs-ui:local electrs-ui/ Step 7 done.

Non-spec apps that already have manifests (outside container-specs.sh)

These are managed entirely by the install RPC today and already have adoption paths in the Rust orchestrator. They are not in 8b scope:

  • aiui, botfights, core-lightning, did-wallet, endurain, gitea, indeedhub, lightning-stack (stack), meshtastic, morphos-server, nostr-rs-relay, router, strfry, web5-dwn.

Schema gaps blocking faithful ports

core/container/src/manifest.rs::AppManifest currently supports:

  • container.image OR container.build (mutually exclusive, validated).
  • dependencies: Vec<Dependency>, resources: {cpu_limit, memory_limit, disk_limit}.
  • security: { capabilities, readonly_root, network_policy: string, apparmor_profile }.
  • ports: Vec<{host, container, protocol}>, volumes: Vec<{type, source, target, options}>.
  • environment: Vec<String> (each "KEY=VALUE").
  • health_check: {type, endpoint, path, interval, timeout, retries}.
  • devices: Vec<String>, extensions: HashMap<String, Value> (flatten).

What container-specs.sh uses that the schema does not express first-class:

Need Example from bash Proposed schema addition
Join the named archy-net bridge SPEC_NETWORK="archy-net" container.network: Option<String> (Some("archy-net"), or None for isolated, or "host"). Existing security.network_policy left as-is for policy knobs (e.g. firewall isolation layer); this new field is literally the podman --network value.
Extra args / custom flags SPEC_CUSTOM_ARGS="-server=1 -prune=550 ..." container.custom_args: Vec<String>.
Entrypoint override SPEC_ENTRYPOINT="gatewayd --data-dir /data ... lnd --lnd-rpc-host lnd:10009" container.entrypoint: Option<Vec<String>>.
Host-derived env (mDNS hostname, host IP) FM_P2P_URL=fedimint://$HOST_MDNS:8173 container.derived_env: Vec<{key, template}> with a small allow-list of {{HOST_MDNS}}, {{HOST_IP}}, {{DISK_GB}} substitutions resolved at apply time.
Secret-file env (read from /var/lib/archipelago/secrets/<name>) FM_BITCOIND_PASSWORD=$BITCOIN_RPC_PASS (from secret file in bash) container.secret_env: Vec<{key, secret_file}>, secret_file relative to $SECRETS_DIR. Never logged.
Data dir UID/GID (for rootless mapped chown) SPEC_DATA_UID="100070:100070" container.data_uid: Option<String> (e.g. "100070:100070"). Applied as chown -R before container create.
Exec health check SPEC_HEALTH_CMD="bitcoin-cli ..." Extend HealthCheck so type: exec + command: Vec<String> works end-to-end; confirm the runtime honors it.
Optional/skip-when-not-installed semantics SPEC_OPTIONAL="true" Already covered: BootReconciler only installs if an AppManifest is registered. For baseline-on-first-boot containers (filebrowser), we use the same install path. No schema change.
Local-image flag (don't pull) SPEC_LOCAL_IMAGE="true" Already covered: container.build vs container.image.

Everything else (tier ordering, dependency tree, readonly_root, tmpfs mounts) is either already in the schema or folded into custom_args cleanly.

tmpfs

SPEC_TMPFS="/tmp:rw,noexec,nosuid,size=256m ..." used by grafana, searxng, ollama. Currently no first-class field. Proposed: volumes[].type: tmpfs with a new tmpfs_options field on Volume, or a dedicated container.tmpfs: Vec<{target, options}>. Either works; the Volume-variant keeps all mount declarations in one place.


Proposed commit sequence

Each item is a separate commit. None recreates a container on the fleet.

8b.0 — schema extensions, no manifest changes, no orchestrator changes

  1. feat(container/manifest): add network, custom_args, entrypoint, derived_env, secret_env, data_uid, tmpfs fields — add fields to ContainerConfig/SecurityPolicy/Volume, update validate(), add unit tests per new field. Backwards-compat: every existing apps/*/manifest.yml must still parse (verify with a parse_every_real_manifest test that walks apps/*/manifest.yml in the repo).

  2. feat(container/manifest): resolve derived_env against host facts — add HostFacts { host_ip, host_mdns, disk_gb } struct and resolve_env(facts) -> Vec<String> method; unit test with a fixed HostFacts.

  3. feat(container/manifest): resolve secret_env against a SecretsProvider — add trait SecretsProvider { fn read(&self, name: &str) -> Result<String>; }, stub FileSecretsProvider rooted at /var/lib/archipelago/secrets, unit test with a tmpdir provider.

8b.1 — orchestrator honors the new fields

  1. feat(prod_orchestrator): honor network/custom_args/entrypoint on create — thread the new ResolvedContainerConfig into the runtime's create call. Mock-runtime unit tests for each field.
  2. feat(prod_orchestrator): chown data dir to data_uid before create — called from install_fresh. Unit test with a tmpdir.
  3. feat(prod_orchestrator): resolve derived_env + secret_env before create — wire in HostFacts + SecretsProvider. Unit test.

8b.2 — first real backend port: fedimint

  1. feat(apps/fedimint): port manifest from container-specs.sh with mDNS URLs + archy-net — rewrites apps/fedimint/manifest.yml using the new schema. Includes container_name: fedimint (no prefix), network: archy-net, derived_env: [FM_P2P_URL, FM_API_URL], secret_env: [FM_BITCOIND_PASSWORD, ...].
  2. feat(apps/fedimint-gateway): new manifest with LND-aware entrypoint — creates apps/fedimint-gateway/manifest.yml. Dynamic entrypoint is a 2-case template resolved by a derived field {{LND_AVAILABLE}} (presence of /var/lib/archipelago/lnd/tls.cert). May require a second commit to add that derived fact — scope-judge at write time.
  3. test(lifecycle): fedimint adoption + fresh-install — bats scaffold per docs/bulletproof-containers.md§Test harness.

8b.3 — remaining critical backends (one per commit)

  1. feat(apps/filebrowser): new manifest — baseline Archipelago service (fixes yesterday's .filebrowser.json loss by regenerating via custom_args: ["--config", "/data/.filebrowser.json"] + caps: [..., NET_BIND_SERVICE]).
  2. feat(apps/electrumx): new manifest.
  3. feat(apps/bitcoin-knots): rename-or-merge with apps/bitcoin-core/manifest.yml — decide naming once, update everywhere. Recommend: keep apps/bitcoin-core/ dir (it's the user-visible app name) and use extensions.container_name: bitcoin-knots to preserve adoption.
  4. feat(apps/lnd): reconcile stub against spec.
  5. feat(apps/btcpay-server + companions): multi-container stack — reuse the existing stack path in api/rpc/package/stacks.rs OR decide to add container.companions: Vec<ContainerConfig>. Defer decision until 1013 land.

8b.4 — mempool stack, optional apps

Continue one-at-a-time until every ⚠ or row above is .

8b.5 — port core/archipelago/src/api/rpc/package/update.rs

Replace reconcile-containers.sh calls with ContainerOrchestrator::upgrade(app_id). Unblocks 8c.

8c — delete bash scripts (per docs/rust-orchestrator-migration.md).


Runtime-only drift on .116 — write it into manifests, not scripts

Per docs/RESUME.md§Runtime-only fixes on .116, yesterday's patches are:

  1. ~archipelago/.config/containers/containers.conf (image_copy_tmp_dir = "storage") → lands in first-boot-setup.sh (renamed in Step 8c) OR in a Rust startup-side prereq hook. Not a per-manifest concern.
  2. Secrets ownership archipelago:archipelago → Rust orchestrator's ensure_secrets path (already exists; verify it chowns).
  3. /var/lib/archipelago/filebrowser-data/.filebrowser.json → handled by filebrowser's custom_args: ["--config", "/data/.filebrowser.json"] plus a pre-start hook (mirrors bitcoin_ui precedent) that writes the file if absent. Details in 8b.3 commit 10.
  4. Fedimint data dir chown → handled by container.data_uid: "100000:100000" in the fedimint manifest.

All runtime-only fixes end up expressed as manifest fields or Rust-side hooks. None survives as bash.


Open decisions (lock before writing code)

  1. bitcoin-knots vs bitcoin-core naming. Recommend: app id stays bitcoin-core (user-facing), container name becomes bitcoin-knots via extensions.container_name, image is Knots. Or rename both to bitcoin-knots for honesty. Pick one and apply everywhere.
  2. archy- prefix rule. Currently UI_APP_IDS in prod_orchestrator.rs hardcodes ["bitcoin-ui", "electrs-ui", "lnd-ui"]archy-. Several backends use archy- too (archy-mempool-db, archy-mempool-web, archy-nbxplorer, archy-btcpay-db). Recommend: drop the hardcoded list, rely on extensions.container_name everywhere, audit all existing manifests to set it explicitly so adoption doesn't orphan.
  3. Companions (mempool-api + mempool-web + mempool-db, btcpay-server + nbxplorer + btcpay-db). Two options: (a) one manifest per container with explicit deps and an "app group" id; (b) extend ContainerConfig with companions: Vec<…>. apps/lightning-stack/manifest.yml already shipped probably has a precedent — check its shape before deciding.
  4. Keep container-specs.sh as the source of truth until 8b is fully ported? Yes. BootReconciler only acts on what's in apps/*/manifest.yml; anything not ported stays on the bash path until its commit lands. Zero-downtime migration.

Where to resume

After user approves this plan: commit 1 in 8b.0 (schema extensions + tests, no orchestrator or manifest changes). Smallest possible diff, highest leverage, and unblocks every subsequent port.

Validation Snapshot - 2026-04-28

  • Runtime cleanup: removed orphan bold_lichterman duplicate; retained managed filebrowser.
  • Launch policy alignment: local app launches are port-based; iframe-blocked apps (including gitea) are forced to new-tab.
  • App icon reliability: image fallback now retries .svg when .png does not exist.
  • Required stack verification on .116:
    • tests/lifecycle/bats/required-stack.bats -> PASS
    • ARCHY_ALLOW_DESTRUCTIVE=1 tests/lifecycle/bats/required-stack-destructive.bats -> PASS
  • Broad host-port probe confirms HTTP 200 responses for user-facing app UIs on mapped ports; non-HTTP ports intentionally excluded from HTTP pass/fail semantics.