chore: mark REBOOT-03 blocked — .198 crash recovery too slow

.198 crash recovery takes >120s for 34 containers. SSH returns reliably (125-145s) but backend health timeout exceeded on all 3 iterations. Needs CONT-02 deployment and/or increased timeout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:34:16 +00:00
parent 8302b0b357
commit ee825cd8d6
1 changed files with 1 additions and 1 deletions
--- a/loop/plan.md
+++ b/loop/plan.md
@@ -233,7 +233,7 @@ Every test must pass **10 consecutive times** from BOTH .228→.198 AND .198→.

 - [x] **REBOOT-02** — Ran reboot survival test 3x on .228. 21/21 checks passed. All 3 reboots: 32/32 containers survive, 0 exited, all containers back, health OK, no restart loops. SSH recovery: 130-145s. Health available: 5s after SSH. Total recovery ~255-270s (includes 120s stabilization wait). Zero failures.

- [ ] **REBOOT-03** — Run reboot survival test 10 times on .198. Same as REBOOT-02 but on .198. **Acceptance**: 10/10 reboots recover fully. Zero failed containers.
+- [ ] **REBOOT-03** — (BLOCKED: .198 crash recovery takes >120s for 34 containers — health timeout exceeded on all 3 reboot iterations. SSH returns in 125-145s but backend startup blocked by sequential container recovery. Needs CONT-02 deployment to .198 and/or increased health wait timeout. 3/6 checks passed — SSH comes back reliably.)

 - [ ] **REBOOT-04** — Test simultaneous reboot of both nodes. Reboot .228 and .198 at the same time. After both recover, verify: federation re-establishes, DWN sync works, file sharing works. **Acceptance**: Both nodes fully recover. Federation sync succeeds within 10 minutes of both being back.