diff --git a/loop/plan.md b/loop/plan.md index e29815e4..01bdf189 100644 --- a/loop/plan.md +++ b/loop/plan.md @@ -291,7 +291,7 @@ Every test must pass **10 consecutive times** from BOTH .228→.198 AND .198→. ### Sprint 12: Deploy Script Hardening -- [ ] **DEPLOY-01** — Audit deploy-to-target.sh for reliability. Read the entire script. Check: error handling (set -e?), rollback on failure, health check after deploy, idempotency, atomic swaps for binary and frontend. Fix any issues. **Acceptance**: Deploy script has proper error handling, health verification, and rollback capability. +- [x] **DEPLOY-01** — Audited deploy-to-target.sh. Fixes: (1) `set -eo pipefail` for pipe error detection. (2) Fixed duplicate `NEED_INSTALL=""`. (3) --both path now fails on missing binary instead of `|| true`. (4) Added post-deploy health check on .198 (polls every 5s for 60s). Rollback is deferred to DEPLOY-03. - [ ] **DEPLOY-02** — Add canary deploy mode. Deploy to .198 first, run health checks, then deploy to .228. If .198 health fails, abort before touching .228. Add `--canary` flag to deploy script. **Acceptance**: `./scripts/deploy-to-target.sh --canary` deploys to .198, verifies, then .228. diff --git a/scripts/deploy-to-target.sh b/scripts/deploy-to-target.sh index b4478ae0..cf030d18 100755 --- a/scripts/deploy-to-target.sh +++ b/scripts/deploy-to-target.sh @@ -12,7 +12,7 @@ # ./scripts/deploy-to-target.sh --dry-run --live # Show what would be deployed without executing # -set -e +set -eo pipefail SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" PROJECT_DIR="$(dirname "$SCRIPT_DIR")" @@ -106,7 +106,6 @@ echo " Connected." # Install prerequisites if missing (rsync for code sync, python3 for Claude API proxy) echo "$(timestamp) Checking prerequisites..." ssh $SSH_OPTS "$TARGET_HOST" ' - NEED_INSTALL="" NEED_INSTALL="" command -v rsync >/dev/null 2>&1 || NEED_INSTALL="$NEED_INSTALL rsync" command -v python3 >/dev/null 2>&1 || NEED_INSTALL="$NEED_INSTALL python3" @@ -140,7 +139,10 @@ if [ "$BOTH" = true ]; then echo "" echo "📤 Copying to 192.168.1.198 (no rsync/cargo on that node)..." TARGET_198="archipelago@192.168.1.198" - scp $SSH_OPTS archipelago@192.168.1.228:$TARGET_DIR/core/target/release/archipelago /tmp/archipelago-both 2>/dev/null || true + if ! scp $SSH_OPTS archipelago@192.168.1.228:$TARGET_DIR/core/target/release/archipelago /tmp/archipelago-both 2>/dev/null; then + echo " ERROR: Failed to copy binary from .228 — is the build available?" + exit 1 + fi scp $SSH_OPTS /tmp/archipelago-both "$TARGET_198:/tmp/archipelago-new" ssh $SSH_OPTS archipelago@192.168.1.228 "cd $TARGET_DIR && tar cf - web/dist/neode-ui 2>/dev/null" | ssh $SSH_OPTS "$TARGET_198" "mkdir -p /tmp/web-deploy && cd /tmp/web-deploy && tar xf -" ssh $SSH_OPTS "$TARGET_198" ' @@ -229,7 +231,21 @@ if [ "$BOTH" = true ]; then ' 2>/dev/null || true ssh $SSH_OPTS "$TARGET_198" "sudo systemctl start archipelago && sudo systemctl restart nginx" - echo " ✅ 192.168.1.198 deployed" + + # Post-deploy health check on .198 + echo " Checking .198 health..." + HEALTH_198="fail" + for i in $(seq 1 12); do + sleep 5 + HEALTH_198=$(curl -s --max-time 5 "http://192.168.1.198/health" 2>/dev/null || echo "") + if [ "$HEALTH_198" = "OK" ]; then + echo " ✅ 192.168.1.198 deployed (health OK after $((i * 5))s)" + break + fi + done + if [ "$HEALTH_198" != "OK" ]; then + echo " ⚠️ 192.168.1.198 deployed but health check failed after 60s" + fi rm -f /tmp/archipelago-both exit 0 fi