docs(status): mark async-spawn lifecycle fix as shipped

Records the four landed commits, the .228 deploy (binary + frontend
paths, backups, md5), the manual LND Stop verification, and the
rollback incantation. Leaves the older "NEXT SESSION" design block
in place as historical reference with a note that it's stale.

Adds a follow-ups list: chaos matrix is now unblocked, bundled-app
RPCs are still sync (deprecate or mirror-async?), transitional_since
is in-memory only, and there are 22 pre-existing test failures in
unrelated modules that should get their own cleanup pass.
This commit is contained in:
archipelago
2026-04-23 05:30:45 -04:00
parent a8158b1ef5
commit 0ea4f96de9

View File

@@ -1,12 +1,43 @@
# RESUME HERE — Rust orchestrator migration
Updated: 2026-04-23 (Dashboard Stop UX bug diagnosed; async-spawn fix fully designed, ready to implement)
Updated: 2026-04-23 (Async-spawn lifecycle fix LANDED + deployed to .228, manual LND Stop verified — "absolutely beautiful" per user. Ready for chaos matrix / Step 11.)
**To resume this work, SSH into the ThinkPad and run `opencode` from `~/Projects/archy/`. Or work from the laptop via the SSHFS mount at `~/mnt/archy-thinkpad/`.**
---
## ⚡ NEXT SESSION — START HERE
## ✅ ASYNC-SPAWN LIFECYCLE FIX — SHIPPED
Landed 2026-04-23 in 4 commits on `main` (unpushed per user mirror protocol):
- `44cd5eef` `feat(rpc): spawn_transitional helper for async lifecycle ops` — new `api/rpc/transitional.rs` with `Op::{Stop,Start,Restart}` and `RpcHandler::spawn_transitional` / `flip_to_transitional` / `set_state` helpers. `install_log` re-exported so sibling modules can use it.
- `19a99ca9` `fix(rpc): async container stop/start/restart; widen state mapping``container.rs` start/stop rewritten + restart added; `container-list` now emits all transitional variants instead of falling back to `"unknown"`. `dispatcher.rs` registers `container-restart`. `package/runtime.rs` mirrored with `do_package_*` helpers inside `tokio::spawn` and revert-on-error.
- `6712810b` `fix(state): preserve transitional state across container scans``server.rs` scan merge now keeps transitional states while taking fresh observability fields; 1200s stuck-timeout escape hatch via `transitional_since: HashMap<String, Instant>`. Three passing `server::merge_tests`.
- `9ce28f08` `fix(ui): single-button lifecycle control with transitional labels``ContainerApps.vue` and `ContainerAppDetails.vue` use a single primary button driven by `getAppVisualState()`. **Dashboard now routes through `container-start`/`container-stop`** (the async RPCs) instead of the legacy synchronous `bundled-app-*` path. `ContainerStatus.vue` widened to render all new variants.
**Deployed to .228** (ThinkPad demo device):
- Binary at `/usr/local/bin/archipelago` (md5 `de86b63f74c7e6fe6e555ffe30b86b4f`), backup at `/usr/local/bin/archipelago.bak-pre-async-stop`.
- Frontend at `/opt/archipelago/web-ui/`, backup at `/opt/archipelago/web-ui.bak-pre-async-stop/`.
- Release build took 3m56s on .116. Deploy via scp + atomic `install -m 755` + `systemctl restart archipelago`. `nginx -t` + `systemctl reload nginx` for frontend.
**Manual verification**: user clicked Stop on LND in the dashboard. Button flipped to `Stopping…` instantly, held for the full graceful-stop window, transitioned to `Start` when `podman stop` completed. No mid-flight revert to Running. User sign-off: _"absolutely beautiful"_.
**Rollback (if ever needed)**:
```
ssh archy228 'sudo cp /usr/local/bin/archipelago.bak-pre-async-stop /usr/local/bin/archipelago && sudo rsync -a --delete /opt/archipelago/web-ui.bak-pre-async-stop/ /opt/archipelago/web-ui/ && sudo systemctl restart archipelago && sudo systemctl reload nginx'
```
### Follow-ups to consider
1. **Chaos matrix / Step 11** — the original next-step gated behind this fix. Now unblocked.
2. **bundled-app-start / bundled-app-stop** — still synchronous in the backend. Dashboard no longer calls them, but the RPC methods remain for any external caller. Decide: deprecate, or mirror the async-spawn treatment for parity.
3. **`transitional_since` persistence** — currently in-memory only, so a backend restart mid-stop loses the timeout anchor. Acceptable for now (scan loop re-observes live podman state and reconciles), but worth revisiting if crash-recovery stories tighten.
4. **Test regressions inventory** — the full `cargo test -p archipelago` run on .116 shows 22 pre-existing failures in unrelated modules (mesh/wallet/credentials/avatar/session/transport/update-mirrors/fips/identity_manager/image_versions). Unrelated to this work but tech debt. Log at `/tmp/cargo-test-all.log` on .116.
5. **Amend STATUS.md's older "NEXT SESSION — START HERE" section** (below) — it is now stale. Left in place for historical reference of how the fix was designed; delete on the next pass if it gets confusing.
---
## ⚡ NEXT SESSION — START HERE (historical — fix above is now shipped)
**Goal**: implement async-spawn lifecycle fix so the dashboard never shows a frozen spinner again. User mandate: _"best server containers in the world"_. Do not ship the chaos matrix (Step 11) until this lands and manual LND stop verifies instant RPC + live `Stopping…` label.