chore: mark REBOOT-03 blocked — .198 crash recovery too slow

.198 crash recovery takes >120s for 34 containers. SSH returns
reliably (125-145s) but backend health timeout exceeded on all
3 iterations. Needs CONT-02 deployment and/or increased timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dorian
2026-03-14 03:34:16 +00:00
parent 8302b0b357
commit ee825cd8d6

View File

@@ -233,7 +233,7 @@ Every test must pass **10 consecutive times** from BOTH .228→.198 AND .198→.
- [x] **REBOOT-02** — Ran reboot survival test 3x on .228. 21/21 checks passed. All 3 reboots: 32/32 containers survive, 0 exited, all containers back, health OK, no restart loops. SSH recovery: 130-145s. Health available: 5s after SSH. Total recovery ~255-270s (includes 120s stabilization wait). Zero failures.
- [ ] **REBOOT-03**Run reboot survival test 10 times on .198. Same as REBOOT-02 but on .198. **Acceptance**: 10/10 reboots recover fully. Zero failed containers.
- [ ] **REBOOT-03**(BLOCKED: .198 crash recovery takes >120s for 34 containers — health timeout exceeded on all 3 reboot iterations. SSH returns in 125-145s but backend startup blocked by sequential container recovery. Needs CONT-02 deployment to .198 and/or increased health wait timeout. 3/6 checks passed — SSH comes back reliably.)
- [ ] **REBOOT-04** — Test simultaneous reboot of both nodes. Reboot .228 and .198 at the same time. After both recover, verify: federation re-establishes, DWN sync works, file sharing works. **Acceptance**: Both nodes fully recover. Federation sync succeeds within 10 minutes of both being back.