feat: bitcoin-ui CSS fix, HTTPS proxy support, deploy script improvements

Bitcoin UI: - Replace cdn.tailwindcss.com with locally bundled tailwind.css (CSP blocks external scripts) - Make all asset paths relative for nginx proxy compatibility - Add bitcoin-ui build/deploy to deploy-to-target.sh (was missing entirely) - Use --network host (bitcoin-ui proxies Bitcoin RPC at 127.0.0.1:8332) HTTPS mixed content fix: - Add HTTPS_PROXY_PATHS in AppSession.vue — when parent page is HTTPS, iframe loads through nginx proxy instead of direct HTTP port - Prevents browser blocking HTTP iframes inside HTTPS pages - All Tailscale servers use HTTPS, this was breaking all app iframes Deploy & first-boot improvements: - first-boot-containers.sh auto-detects disk size for pruning vs txindex - first-boot-containers.sh checks fallback source path for UI containers - Added mempool-electrs to APP_PORTS mapping - ElectrumX container creation in first-boot - Podman doctor/fix/uptime skills added Also includes: session persistence, identity management, LND transactions, ElectrumX status UI, nostr-provider improvements, Web5 enhancements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 12:58:35 +00:00
parent 4e54b8bd4d
commit 367b483a72
49 changed files with 6180 additions and 495 deletions
--- a/.claude/skills/podman-doctor/SKILL.md
+++ b/.claude/skills/podman-doctor/SKILL.md
@@ -0,0 +1,156 @@
+---
+name: podman-doctor
+description: >
+  Comprehensive Podman container diagnostic for Archipelago. Audits all running containers,
+  port mappings, network connectivity, health status, restart policies, and config consistency
+  across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing).
+  Use when asked to "diagnose containers", "check podman", "why is app not working",
+  "container health check", "port not reachable", "audit containers", "podman status",
+  or when any container/app is misbehaving.
+allowed-tools: Bash Read Glob Grep
+---
+
+# Podman Doctor — Container Infrastructure Diagnostics
+
+Systematic diagnostic for Archipelago's Podman container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, and config drift across all layers.
+
+**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
+
+If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit.
+
+## Workflow
+
+### Step 1: Gather Runtime State
+
+Run these on the server:
+
+```bash
+# All containers with status, ports, networks
+sudo podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
+
+# Check for port conflicts on known ports
+sudo ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b"
+```
+
+### Step 2: Check Restart Policies
+
+Every container MUST have `--restart unless-stopped`. This is the #1 cause of downtime after reboots.
+
+```bash
+for c in $(sudo podman ps -a --format "{{.Names}}"); do
+  echo -n "$c: "
+  sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
+done
+```
+
+**Red flag**: `no` or empty = container won't survive reboot.
+
+### Step 3: Verify Port Mapping Consistency
+
+Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs:
+
+**Layer 1 — Backend Config (Rust)**: Read `core/archipelago/src/api/rpc/package.rs`, look at `get_app_config()` port mappings.
+
+**Layer 2 — Podman Runtime**: `sudo podman ps --format "{{.Names}}: {{.Ports}}"`
+
+**Layer 3 — Nginx Proxy**: Read these for `/app/{id}/` location blocks:
+- `image-recipe/configs/nginx-archipelago.conf` (HTTP)
+- `image-recipe/configs/snippets/archipelago-https-app-proxies.conf` (HTTPS)
+
+**Layer 4 — Frontend Routing**: Read `neode-ui/src/stores/appLauncher.ts` — `PORT_TO_APP_ID` map.
+
+| Symptom | Root Cause |
+|---------|-----------|
+| App iframe shows 502/504 | Nginx proxies to wrong port, or container not running |
+| App loads wrong content | Port collision — two containers on same host port |
+| Works on port but not /app/ path | Missing nginx location block |
+| Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts |
+
+### Step 4: Network Connectivity Audit
+
+```bash
+# Networks and their containers
+sudo podman network ls
+sudo podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!"
+```
+
+**Must be on archy-net**: bitcoin-knots, lnd, electrs, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui
+
+**Must NOT be on archy-net**: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network)
+
+### Step 5: Health Check Status
+
+```bash
+# Containers with health checks — are they passing?
+for c in $(sudo podman ps --format "{{.Names}}"); do
+  health=$(sudo podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
+  if [ -n "$health" ] && [ "$health" != "<no value>" ]; then
+    echo "$c: $health"
+  fi
+done
+
+# Containers WITHOUT health checks (gap in monitoring)
+for c in $(sudo podman ps --format "{{.Names}}"); do
+  hc=$(sudo podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null)
+  if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then
+    echo "NO HEALTHCHECK: $c"
+  fi
+done
+```
+
+### Step 6: Resource & Failure Analysis
+
+```bash
+# Resource usage
+sudo podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
+
+# Recent deaths (last 24h)
+sudo podman events --filter event=died --since 24h 2>/dev/null | tail -20
+
+# OOM kills
+sudo podman ps -a --format "{{.Names}}" | while read c; do
+  oom=$(sudo podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null)
+  [ "$oom" = "true" ] && echo "OOM KILLED: $c"
+done
+
+# Non-zero exits
+sudo podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}"
+```
+
+### Step 7: Systemd Integration
+
+```bash
+systemctl is-active archipelago nginx
+systemctl list-units --type=service | grep -i podman
+systemctl list-timers --all | grep -i -E "podman|container|archipelago"
+```
+
+### Step 8: Generate Report
+
+Produce a structured report:
+
+```
+## Container Diagnostic Report
+
+### Summary
+- Total containers: X running, Y stopped, Z unhealthy
+- Port conflicts: [list or "none"]
+- Missing restart policies: [list or "none"]
+- Network issues: [list or "none"]
+- Health check gaps: [list]
+
+### Critical Issues (fix immediately)
+1. ...
+
+### Warnings (fix soon)
+1. ...
+
+### Recommended Actions
+1. ...
+```
+
+After diagnosis, suggest running `/podman-fix` for any issues found.
+
+## Port Reference
+
+See `references/port-map.md` for the canonical port assignment table across all 4 layers.
--- a/.claude/skills/podman-doctor/references/common-failures.md
+++ b/.claude/skills/podman-doctor/references/common-failures.md
@@ -0,0 +1,55 @@
+# Common Podman Failure Patterns
+
+## Container Won't Start
+
+| Error | Cause | Fix |
+|-------|-------|-----|
+| `exec format error` | Binary built on wrong arch | Rebuild on the Linux server |
+| `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender |
+| `permission denied` | Missing capability or read-only root | Check `get_app_capabilities()`, add tmpfs |
+| `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` |
+| `image not known` | Image not pulled | `podman pull IMAGE:TAG` |
+| `no such network` | Network missing | `podman network create archy-net` |
+
+## Container Starts But App Unreachable
+
+| Symptom | Check Layer | Fix |
+|---------|------------|-----|
+| Direct port works, /app/ doesn't | Nginx config | Add `/app/{id}/` location block |
+| Neither works | Podman ports | `podman port NAME` — verify mapping exists |
+| Port mapped but refused | Container logs | App crashing internally — check logs |
+| Works sometimes | Resources | Check OOM kills, CPU, disk space |
+| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
+
+## Container Keeps Dying
+
+| Pattern | Cause | Fix |
+|---------|-------|-----|
+| Exits immediately (code 1) | Config error | Check `podman logs NAME` |
+| Dies after minutes | OOM killed | Increase `--memory` limit |
+| Dies when dep restarts | No restart policy | Add `--restart unless-stopped` |
+| Crash loop | Repeated crash | Fix root cause, don't just restart |
+
+## Network Issues
+
+| Problem | Cause | Fix |
+|---------|-------|-----|
+| Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` |
+| Can't reach internet | DNS missing | Add `--dns 1.1.1.1` |
+| Container-to-container timeout | Different networks | Put both on same network |
+
+## Capability Reference
+
+| Capability | Apps That Need It | Failure Mode |
+|-----------|------------------|-------------|
+| CHOWN | nextcloud, homeassistant, btcpay, jellyfin, portainer | Can't chown during setup |
+| SETUID/SETGID | nextcloud, homeassistant, btcpay, jellyfin | Can't switch to service user |
+| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
+| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
+| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
+
+## Read-Only Safe Apps
+
+Only these 8 apps can run with `--read-only`: searxng, grafana, filebrowser, electrs, nostr-rs-relay, ollama, indeedhub
+
+All others need writable root or will fail silently.
--- a/.claude/skills/podman-doctor/references/port-map.md
+++ b/.claude/skills/podman-doctor/references/port-map.md
@@ -0,0 +1,71 @@
+# Archipelago Canonical Port Map
+
+All port assignments across the 4 configuration layers. When adding or debugging an app, every row must be consistent across all columns.
+
+## Bitcoin Stack
+
+| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
+|-----|-------------|-------------------|---------|------------|-------------|
+| bitcoin-knots | 8332, 8333 | 8332, 8333 | archy-net | /app/bitcoin-knots/ | 8332→bitcoin-knots |
+| bitcoin-ui | 8334 | 80 | bridge | /app/bitcoin-ui/ | 8334→bitcoin-knots |
+| electrs | 50001 | 50001 | archy-net | /app/electrs/ | 50001→electrs |
+| lnd | 9735, 10009, 8080 | 9735, 10009, 8080 | archy-net | /app/lnd/ | 10009→lnd |
+| lnd-ui (RTL) | 8081 | 80 | bridge | /app/lnd-ui/ | 8081→lnd |
+
+## Lightning & Payment
+
+| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
+|-----|-------------|-------------------|---------|------------|-------------|
+| btcpay-server | 23000 | 49392 | archy-net | /app/btcpay/ | 23000→btcpay-server |
+| nbxplorer | 24444 | 32838 | archy-net | N/A (internal) | N/A |
+| fedimint | 8173, 8174, 8175 | 8173, 8174, 8175 | archy-net | /app/fedimint/ | 8174→fedimint |
+| fedimint-gateway | 8175 | 8175 | archy-net | /app/fedimint-gateway/ | 8175→fedimint-gateway |
+
+## Explorer & Monitoring
+
+| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
+|-----|-------------|-------------------|---------|------------|-------------|
+| mempool | 4080 | 8080 | archy-net | /app/mempool/ | 4080→mempool |
+| grafana | 3000 | 3000 | bridge | /app/grafana/ | 3000→grafana (new tab) |
+
+## Self-Hosted Apps
+
+| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
+|-----|-------------|-------------------|---------|------------|-------------|
+| nextcloud | 8085 | 80 | bridge | /app/nextcloud/ | 8085→nextcloud |
+| vaultwarden | 8082 | 80 | bridge | /app/vaultwarden/ | 8082→vaultwarden (new tab) |
+| filebrowser | 8083 | 80 | bridge | /app/filebrowser/ | 8083→filebrowser |
+| searxng | 8888 | 8080 | bridge | /app/searxng/ | 8888→searxng |
+| photoprism | 2342 | 2342 | bridge | /app/photoprism/ | 2342→photoprism (new tab) |
+| jellyfin | 8096 | 8096 | bridge | /app/jellyfin/ | 8096→jellyfin |
+| homeassistant | 8123 | 8123 | bridge | /app/homeassistant/ | 8123→homeassistant (new tab) |
+| ollama | 11434 | 11434 | archy-net | /app/ollama/ | 11434→ollama |
+| open-webui | 3080 | 8080 | archy-net | /app/open-webui/ | 3080→open-webui |
+
+## Nostr & Social
+
+| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
+|-----|-------------|-------------------|---------|------------|-------------|
+| nostr-rs-relay | 7000 | 8080 | archy-net | /app/nostr-rs-relay/ | 7000→nostr-rs-relay |
+| indeedhub | 3001 | 3000 | archy-net | /app/indeedhub/ | 3001→indeedhub |
+
+## System
+
+| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
+|-----|-------------|-------------------|---------|------------|-------------|
+| tailscale | 8240 | 8240 | host | /app/tailscale/ | N/A |
+| nginx-proxy-manager | 81, 8443 | 81, 443 | bridge | N/A | 81→nginx-proxy-manager |
+
+## Multi-Container Stacks
+
+**Immich**: immich-server (2283), immich-postgres (internal 5432), immich-redis (internal 6379) — all on immich-net
+**Penpot**: penpot-frontend (9001→80), penpot-backend, penpot-exporter, penpot-postgres, penpot-mailcatch — all on penpot-net
+**Mempool**: mempool (4080→8080), mempool-db (internal 3306) — on archy-net
+**BTCPay**: btcpay-server (23000→49392), nbxplorer (24444→32838), btcpay-postgres (internal 5432) — on archy-net
+
+## Key Notes
+
+- **archy-net apps** resolve each other by container name (e.g., `bitcoin-knots:8332`)
+- **bridge apps** are standalone — access services via host IP/port
+- **host network** (tailscale only) — shares host namespace, no port mapping
+- **New tab apps**: btcpay (23000), grafana (3000), vaultwarden (8082), photoprism (2342), homeassistant (8123) — X-Frame-Options blocks iframe
--- a/.claude/skills/podman-fix/SKILL.md
+++ b/.claude/skills/podman-fix/SKILL.md
@@ -0,0 +1,219 @@
+---
+name: podman-fix
+description: >
+  Fix Podman container issues on Archipelago — restart failed containers, repair port bindings,
+  fix network connectivity, add missing restart policies, and resolve config drift.
+  Use when asked to "fix container", "restart app", "fix port mapping", "container not working",
+  "app won't start", "fix podman", "repair container", "container down", or after /podman-doctor
+  identifies issues to fix.
+allowed-tools: Bash Read Edit Write Glob Grep
+---
+
+# Podman Fix — Container Remediation
+
+Targeted fix workflow for Podman container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it.
+
+**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
+
+If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing.
+
+## Fix Procedures
+
+### Fix 1: Container Not Running
+
+```bash
+# Check why it stopped
+sudo podman logs --tail 50 CONTAINER_NAME
+sudo podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}"
+
+# If clean exit or crash — just restart
+sudo podman start CONTAINER_NAME
+
+# If corrupt state — remove and recreate
+sudo podman rm -f CONTAINER_NAME
+# Then recreate using the install flow (trigger from UI or re-run creation command)
+```
+
+**If container keeps crashing**: check logs for the actual error. Common causes:
+- Missing config file → check if volume mount has the config
+- Wrong permissions → `chown -R` the data directory
+- Dependency not ready → start dependency first, wait, then start this container
+
+### Fix 2: Missing Restart Policy
+
+The most common uptime killer. Fix for ALL containers at once:
+
+```bash
+# Fix a single container
+sudo podman update --restart unless-stopped CONTAINER_NAME
+
+# Fix ALL containers that have no restart policy
+for c in $(sudo podman ps -a --format "{{.Names}}"); do
+  policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
+  if [ "$policy" = "no" ] || [ -z "$policy" ]; then
+    echo "Fixing restart policy for: $c"
+    sudo podman update --restart unless-stopped "$c"
+  fi
+done
+```
+
+**Also update the Rust source** so new installs get it right:
+- Check `core/archipelago/src/api/rpc/package.rs` `get_app_config()` for the app
+- Ensure `--restart` flag is in the podman run args
+
+### Fix 3: Port Mapping Issues
+
+#### Port conflict (address already in use)
+```bash
+# Find what's using the port
+sudo ss -tlnp | grep :PORT_NUMBER
+
+# If it's another container, either change one's port or stop the conflicting one
+sudo podman stop CONFLICTING_CONTAINER
+
+# If it's a host process
+sudo kill PID  # or stop the service
+```
+
+#### Port not mapped (container running but port unreachable)
+```bash
+# Check current port mappings
+sudo podman port CONTAINER_NAME
+
+# Can't add ports to running container — must recreate
+sudo podman stop CONTAINER_NAME
+sudo podman rm CONTAINER_NAME
+# Recreate with correct -p flags (use the Rust install flow or manual podman run)
+```
+
+#### Nginx proxy missing or wrong
+Read and fix the nginx config:
+- HTTP: `image-recipe/configs/nginx-archipelago.conf`
+- HTTPS: `image-recipe/configs/snippets/archipelago-https-app-proxies.conf`
+
+Add a location block:
+```nginx
+location /app/APP_ID/ {
+    proxy_pass http://127.0.0.1:HOST_PORT/;
+    proxy_set_header Host $host;
+    proxy_set_header X-Real-IP $remote_addr;
+    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+    proxy_set_header X-Forwarded-Proto $scheme;
+    proxy_http_version 1.1;
+    proxy_set_header Upgrade $http_upgrade;
+    proxy_set_header Connection $connection_upgrade;
+    # Hide X-Frame-Options so it works in our iframe
+    proxy_hide_header X-Frame-Options;
+    proxy_hide_header Content-Security-Policy;
+}
+```
+
+After editing nginx config, deploy and reload:
+```bash
+# On server
+sudo nginx -t && sudo systemctl reload nginx
+```
+
+#### Frontend routing missing
+Edit `neode-ui/src/stores/appLauncher.ts`:
+- Add entry to `PORT_TO_APP_ID` map
+- If app blocks iframes, add port to the new-tab list in `resolveAppIdFromUrl()`
+
+### Fix 4: Network Issues
+
+#### Container not on archy-net (can't resolve other containers)
+```bash
+# Connect to archy-net without recreating
+sudo podman network connect archy-net CONTAINER_NAME
+
+# Verify
+sudo podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}"
+```
+
+#### archy-net doesn't exist
+```bash
+sudo podman network create archy-net
+# Then reconnect all containers that need it
+```
+
+#### DNS not working inside container
+```bash
+# Test DNS from inside container
+sudo podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \
+sudo podman exec CONTAINER_NAME ping -c1 bitcoin-knots
+
+# If DNS fails, recreate container with explicit DNS
+# Add --dns 1.1.1.1 to the podman run command
+```
+
+### Fix 5: Health Check Issues
+
+#### Add missing health check to running container
+Can't add to running container — must recreate with health check flags:
+```bash
+# Example for a web app
+sudo podman run ... \
+  --health-cmd "curl -f http://localhost:PORT/health || exit 1" \
+  --health-interval 30s \
+  --health-timeout 5s \
+  --health-retries 3 \
+  --health-start-period 60s \
+  IMAGE
+```
+
+#### Fix unhealthy container
+```bash
+# See what the health check is actually running
+sudo podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}"
+
+# Run the health check manually to see the error
+sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
+
+# Common fixes:
+# - curl not installed in container → use wget or nc instead
+# - Wrong port in health check → fix the check command
+# - App takes too long to start → increase --health-start-period
+```
+
+### Fix 6: Permission/Capability Issues
+
+```bash
+# Check what capabilities container has
+sudo podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}"
+
+# If missing required caps, must recreate with correct --cap-add flags
+# Refer to the capability reference in /podman-doctor references
+
+# Fix data directory permissions
+sudo chown -R 1000:1000 /var/lib/archipelago/APP_NAME/
+```
+
+### Fix 7: Full Config Consistency Fix
+
+When port map is inconsistent across layers, fix ALL layers:
+
+1. **Decide the correct port** (usually what's in package.rs)
+2. **Fix Podman**: recreate container with correct `-p` flags
+3. **Fix Nginx**: update location block's `proxy_pass` port
+4. **Fix Frontend**: update `PORT_TO_APP_ID` in appLauncher.ts
+5. **Deploy**: `./scripts/deploy-to-target.sh --live`
+6. **Verify**: `curl -I http://192.168.1.228/app/APP_ID/`
+
+## After Fixing
+
+Always verify the fix:
+```bash
+# Container running?
+sudo podman ps --filter name=CONTAINER_NAME
+
+# Port reachable?
+curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
+
+# Via nginx proxy?
+curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/
+
+# Health check passing?
+sudo podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}"
+```
+
+Run `/podman-doctor` again to confirm all issues are resolved.
--- a/.claude/skills/podman-uptime/SKILL.md
+++ b/.claude/skills/podman-uptime/SKILL.md
@@ -0,0 +1,309 @@
+---
+name: podman-uptime
+description: >
+  Ensure 100% container uptime on Archipelago. Sets up systemd watchdog timers, verifies
+  restart policies, creates health check monitors, and configures auto-recovery for all
+  containers. Use when asked to "ensure uptime", "containers keep dying", "auto-restart",
+  "watchdog", "container monitoring", "uptime guarantee", "keep containers running",
+  "survive reboot", or to harden container reliability.
+allowed-tools: Bash Read Edit Write Glob Grep
+---
+
+# Podman Uptime — Container Reliability Guardian
+
+Ensures every Archipelago container survives reboots, recovers from crashes, and stays healthy. Sets up the three layers of uptime defense: restart policies, systemd watchdog, and health-based auto-recovery.
+
+**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
+
+## Layer 1: Restart Policies (Survive Reboots)
+
+Every container MUST have `--restart unless-stopped`. This is non-negotiable.
+
+### Audit and fix all containers
+
+```bash
+# Audit
+for c in $(sudo podman ps -a --format "{{.Names}}"); do
+  policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
+  echo "$c: $policy"
+done
+
+# Fix any with "no" or empty policy
+for c in $(sudo podman ps -a --format "{{.Names}}"); do
+  policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
+  if [ "$policy" = "no" ] || [ -z "$policy" ]; then
+    echo "Fixing: $c"
+    sudo podman update --restart unless-stopped "$c"
+  fi
+done
+```
+
+### Ensure podman auto-starts containers on boot
+
+```bash
+# Enable podman-restart service (restarts containers with restart policy on boot)
+sudo systemctl enable podman-restart.service 2>/dev/null || true
+
+# If podman-restart doesn't exist, create it
+cat <<'EOF' | sudo tee /etc/systemd/system/podman-restart.service
+[Unit]
+Description=Podman Start All Containers With Restart Policy
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=oneshot
+ExecStart=/usr/bin/podman start --all --filter restart-policy=unless-stopped
+RemainAfterExit=yes
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable podman-restart.service
+```
+
+## Layer 2: Systemd Watchdog (Detect and Recover)
+
+Create a systemd timer that checks container health every 2 minutes and restarts unhealthy or stopped containers.
+
+### Create the watchdog script
+
+```bash
+cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-container-watchdog.sh
+#!/bin/bash
+# Archipelago Container Watchdog
+# Checks all containers and restarts any that are stopped or unhealthy
+
+LOG_TAG="container-watchdog"
+
+# Restart any stopped containers that should be running (have restart policy)
+for c in $(sudo podman ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}"); do
+  logger -t "$LOG_TAG" "Restarting stopped container: $c"
+  sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG"
+done
+
+# Restart unhealthy containers
+for c in $(sudo podman ps --filter health=unhealthy --format "{{.Names}}"); do
+  logger -t "$LOG_TAG" "Restarting unhealthy container: $c"
+  sudo podman restart "$c" 2>&1 | logger -t "$LOG_TAG"
+done
+
+# Check for containers in "created" state (never started)
+for c in $(sudo podman ps -a --filter status=created --format "{{.Names}}"); do
+  logger -t "$LOG_TAG" "Starting created container: $c"
+  sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG"
+done
+SCRIPT
+
+sudo chmod +x /usr/local/bin/archipelago-container-watchdog.sh
+```
+
+### Create the systemd timer
+
+```bash
+# Service unit
+cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.service
+[Unit]
+Description=Archipelago Container Watchdog
+After=podman-restart.service
+
+[Service]
+Type=oneshot
+ExecStart=/usr/local/bin/archipelago-container-watchdog.sh
+EOF
+
+# Timer unit — runs every 2 minutes
+cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.timer
+[Unit]
+Description=Run Archipelago Container Watchdog every 2 minutes
+
+[Timer]
+OnBootSec=120
+OnUnitActiveSec=120
+AccuracySec=30
+
+[Install]
+WantedBy=timers.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable --now archipelago-watchdog.timer
+```
+
+### Verify watchdog is running
+
+```bash
+sudo systemctl status archipelago-watchdog.timer
+sudo systemctl list-timers | grep archipelago
+# Check watchdog logs
+sudo journalctl -t container-watchdog --since "1 hour ago" --no-pager
+```
+
+## Layer 3: Dependency-Aware Startup Order
+
+Some containers depend on others. The watchdog handles restarts, but initial boot order matters.
+
+### Create ordered startup script
+
+```bash
+cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-ordered-start.sh
+#!/bin/bash
+# Ordered container startup for Archipelago
+# Respects dependency chain: bitcoin → electrs/lnd → mempool/btcpay
+
+LOG_TAG="ordered-start"
+
+wait_for_container() {
+  local name=$1
+  local max_wait=${2:-60}
+  local waited=0
+  while [ $waited -lt $max_wait ]; do
+    status=$(sudo podman inspect "$name" --format "{{.State.Running}}" 2>/dev/null)
+    if [ "$status" = "true" ]; then
+      logger -t "$LOG_TAG" "$name is running"
+      return 0
+    fi
+    sleep 5
+    waited=$((waited + 5))
+  done
+  logger -t "$LOG_TAG" "WARNING: $name not running after ${max_wait}s"
+  return 1
+}
+
+# Tier 0: Infrastructure
+logger -t "$LOG_TAG" "Starting Tier 0: Infrastructure"
+sudo podman start tailscale 2>/dev/null
+
+# Tier 1: Bitcoin (foundation)
+logger -t "$LOG_TAG" "Starting Tier 1: Bitcoin"
+sudo podman start bitcoin-knots 2>/dev/null
+wait_for_container bitcoin-knots 120
+
+# Tier 2: Bitcoin-dependent services
+logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin-dependent"
+sudo podman start electrs 2>/dev/null
+sudo podman start lnd 2>/dev/null
+wait_for_container electrs 90
+wait_for_container lnd 90
+
+# Tier 3: Services depending on Tier 2
+logger -t "$LOG_TAG" "Starting Tier 3: Second-order dependencies"
+sudo podman start mempool-db 2>/dev/null
+sleep 5
+sudo podman start mempool 2>/dev/null
+sudo podman start nbxplorer 2>/dev/null
+sleep 10
+sudo podman start btcpay-server 2>/dev/null
+sudo podman start btcpay-postgres 2>/dev/null
+
+# Tier 4: Independent apps (start all remaining)
+logger -t "$LOG_TAG" "Starting Tier 4: Independent apps"
+sudo podman start --all 2>/dev/null
+
+# Tier 5: UI containers (need parent apps running first)
+logger -t "$LOG_TAG" "Starting Tier 5: UI containers"
+sudo podman start bitcoin-ui 2>/dev/null
+sudo podman start lnd-ui 2>/dev/null
+
+logger -t "$LOG_TAG" "Startup sequence complete"
+SCRIPT
+
+sudo chmod +x /usr/local/bin/archipelago-ordered-start.sh
+```
+
+### Wire into boot sequence
+
+```bash
+cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-containers.service
+[Unit]
+Description=Archipelago Ordered Container Startup
+After=network-online.target podman.service
+Wants=network-online.target
+Before=archipelago.service
+
+[Service]
+Type=oneshot
+ExecStart=/usr/local/bin/archipelago-ordered-start.sh
+RemainAfterExit=yes
+TimeoutStartSec=300
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable archipelago-containers.service
+```
+
+## Verification Checklist
+
+After setting up all 3 layers, verify:
+
+```bash
+echo "=== Layer 1: Restart Policies ==="
+for c in $(sudo podman ps -a --format "{{.Names}}"); do
+  policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
+  echo "  $c: $policy"
+done
+
+echo ""
+echo "=== Layer 2: Watchdog Timer ==="
+sudo systemctl is-active archipelago-watchdog.timer
+sudo systemctl list-timers | grep archipelago
+
+echo ""
+echo "=== Layer 3: Boot Services ==="
+sudo systemctl is-enabled podman-restart.service 2>/dev/null || echo "podman-restart: not found"
+sudo systemctl is-enabled archipelago-containers.service 2>/dev/null || echo "ordered-start: not found"
+sudo systemctl is-enabled archipelago-watchdog.timer 2>/dev/null || echo "watchdog: not found"
+
+echo ""
+echo "=== Container Health Summary ==="
+total=$(sudo podman ps -a --format "{{.Names}}" | wc -l)
+running=$(sudo podman ps --format "{{.Names}}" | wc -l)
+stopped=$((total - running))
+unhealthy=$(sudo podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l)
+echo "  Total: $total | Running: $running | Stopped: $stopped | Unhealthy: $unhealthy"
+```
+
+## Reboot Test
+
+The ultimate uptime test — reboot the server and verify everything comes back:
+
+```bash
+# Before reboot: record running containers
+sudo podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt
+
+# Reboot
+sudo reboot
+
+# After reboot (wait ~3 minutes, then SSH back in):
+sudo podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt
+
+# Compare
+diff /tmp/before-reboot.txt /tmp/after-reboot.txt
+# Should show no differences
+```
+
+## Monitoring
+
+Check uptime status anytime:
+```bash
+# Quick status
+sudo podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort
+
+# Watchdog activity
+sudo journalctl -t container-watchdog --since "24 hours ago" --no-pager
+
+# Container events (starts, stops, deaths)
+sudo podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30
+```
+
+## Integration
+
+- Run `/podman-doctor` first to identify issues
+- Run `/podman-fix` for specific container repairs
+- Run `/podman-uptime` to set up permanent reliability infrastructure
+- Add to ISO build: copy watchdog scripts to `image-recipe/configs/` and enable in first-boot