feat: bitcoin-ui CSS fix, HTTPS proxy support, deploy script improvements
Bitcoin UI: - Replace cdn.tailwindcss.com with locally bundled tailwind.css (CSP blocks external scripts) - Make all asset paths relative for nginx proxy compatibility - Add bitcoin-ui build/deploy to deploy-to-target.sh (was missing entirely) - Use --network host (bitcoin-ui proxies Bitcoin RPC at 127.0.0.1:8332) HTTPS mixed content fix: - Add HTTPS_PROXY_PATHS in AppSession.vue — when parent page is HTTPS, iframe loads through nginx proxy instead of direct HTTP port - Prevents browser blocking HTTP iframes inside HTTPS pages - All Tailscale servers use HTTPS, this was breaking all app iframes Deploy & first-boot improvements: - first-boot-containers.sh auto-detects disk size for pruning vs txindex - first-boot-containers.sh checks fallback source path for UI containers - Added mempool-electrs to APP_PORTS mapping - ElectrumX container creation in first-boot - Podman doctor/fix/uptime skills added Also includes: session persistence, identity management, LND transactions, ElectrumX status UI, nostr-provider improvements, Web5 enhancements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
156
.claude/skills/podman-doctor/SKILL.md
Normal file
156
.claude/skills/podman-doctor/SKILL.md
Normal file
@@ -0,0 +1,156 @@
|
||||
---
|
||||
name: podman-doctor
|
||||
description: >
|
||||
Comprehensive Podman container diagnostic for Archipelago. Audits all running containers,
|
||||
port mappings, network connectivity, health status, restart policies, and config consistency
|
||||
across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing).
|
||||
Use when asked to "diagnose containers", "check podman", "why is app not working",
|
||||
"container health check", "port not reachable", "audit containers", "podman status",
|
||||
or when any container/app is misbehaving.
|
||||
allowed-tools: Bash Read Glob Grep
|
||||
---
|
||||
|
||||
# Podman Doctor — Container Infrastructure Diagnostics
|
||||
|
||||
Systematic diagnostic for Archipelago's Podman container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, and config drift across all layers.
|
||||
|
||||
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
||||
|
||||
If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Gather Runtime State
|
||||
|
||||
Run these on the server:
|
||||
|
||||
```bash
|
||||
# All containers with status, ports, networks
|
||||
sudo podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
|
||||
|
||||
# Check for port conflicts on known ports
|
||||
sudo ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b"
|
||||
```
|
||||
|
||||
### Step 2: Check Restart Policies
|
||||
|
||||
Every container MUST have `--restart unless-stopped`. This is the #1 cause of downtime after reboots.
|
||||
|
||||
```bash
|
||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
||||
echo -n "$c: "
|
||||
sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
|
||||
done
|
||||
```
|
||||
|
||||
**Red flag**: `no` or empty = container won't survive reboot.
|
||||
|
||||
### Step 3: Verify Port Mapping Consistency
|
||||
|
||||
Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs:
|
||||
|
||||
**Layer 1 — Backend Config (Rust)**: Read `core/archipelago/src/api/rpc/package.rs`, look at `get_app_config()` port mappings.
|
||||
|
||||
**Layer 2 — Podman Runtime**: `sudo podman ps --format "{{.Names}}: {{.Ports}}"`
|
||||
|
||||
**Layer 3 — Nginx Proxy**: Read these for `/app/{id}/` location blocks:
|
||||
- `image-recipe/configs/nginx-archipelago.conf` (HTTP)
|
||||
- `image-recipe/configs/snippets/archipelago-https-app-proxies.conf` (HTTPS)
|
||||
|
||||
**Layer 4 — Frontend Routing**: Read `neode-ui/src/stores/appLauncher.ts` — `PORT_TO_APP_ID` map.
|
||||
|
||||
| Symptom | Root Cause |
|
||||
|---------|-----------|
|
||||
| App iframe shows 502/504 | Nginx proxies to wrong port, or container not running |
|
||||
| App loads wrong content | Port collision — two containers on same host port |
|
||||
| Works on port but not /app/ path | Missing nginx location block |
|
||||
| Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts |
|
||||
|
||||
### Step 4: Network Connectivity Audit
|
||||
|
||||
```bash
|
||||
# Networks and their containers
|
||||
sudo podman network ls
|
||||
sudo podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!"
|
||||
```
|
||||
|
||||
**Must be on archy-net**: bitcoin-knots, lnd, electrs, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui
|
||||
|
||||
**Must NOT be on archy-net**: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network)
|
||||
|
||||
### Step 5: Health Check Status
|
||||
|
||||
```bash
|
||||
# Containers with health checks — are they passing?
|
||||
for c in $(sudo podman ps --format "{{.Names}}"); do
|
||||
health=$(sudo podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
|
||||
if [ -n "$health" ] && [ "$health" != "<no value>" ]; then
|
||||
echo "$c: $health"
|
||||
fi
|
||||
done
|
||||
|
||||
# Containers WITHOUT health checks (gap in monitoring)
|
||||
for c in $(sudo podman ps --format "{{.Names}}"); do
|
||||
hc=$(sudo podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null)
|
||||
if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then
|
||||
echo "NO HEALTHCHECK: $c"
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
### Step 6: Resource & Failure Analysis
|
||||
|
||||
```bash
|
||||
# Resource usage
|
||||
sudo podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
|
||||
|
||||
# Recent deaths (last 24h)
|
||||
sudo podman events --filter event=died --since 24h 2>/dev/null | tail -20
|
||||
|
||||
# OOM kills
|
||||
sudo podman ps -a --format "{{.Names}}" | while read c; do
|
||||
oom=$(sudo podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null)
|
||||
[ "$oom" = "true" ] && echo "OOM KILLED: $c"
|
||||
done
|
||||
|
||||
# Non-zero exits
|
||||
sudo podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}"
|
||||
```
|
||||
|
||||
### Step 7: Systemd Integration
|
||||
|
||||
```bash
|
||||
systemctl is-active archipelago nginx
|
||||
systemctl list-units --type=service | grep -i podman
|
||||
systemctl list-timers --all | grep -i -E "podman|container|archipelago"
|
||||
```
|
||||
|
||||
### Step 8: Generate Report
|
||||
|
||||
Produce a structured report:
|
||||
|
||||
```
|
||||
## Container Diagnostic Report
|
||||
|
||||
### Summary
|
||||
- Total containers: X running, Y stopped, Z unhealthy
|
||||
- Port conflicts: [list or "none"]
|
||||
- Missing restart policies: [list or "none"]
|
||||
- Network issues: [list or "none"]
|
||||
- Health check gaps: [list]
|
||||
|
||||
### Critical Issues (fix immediately)
|
||||
1. ...
|
||||
|
||||
### Warnings (fix soon)
|
||||
1. ...
|
||||
|
||||
### Recommended Actions
|
||||
1. ...
|
||||
```
|
||||
|
||||
After diagnosis, suggest running `/podman-fix` for any issues found.
|
||||
|
||||
## Port Reference
|
||||
|
||||
See `references/port-map.md` for the canonical port assignment table across all 4 layers.
|
||||
55
.claude/skills/podman-doctor/references/common-failures.md
Normal file
55
.claude/skills/podman-doctor/references/common-failures.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# Common Podman Failure Patterns
|
||||
|
||||
## Container Won't Start
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `exec format error` | Binary built on wrong arch | Rebuild on the Linux server |
|
||||
| `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender |
|
||||
| `permission denied` | Missing capability or read-only root | Check `get_app_capabilities()`, add tmpfs |
|
||||
| `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` |
|
||||
| `image not known` | Image not pulled | `podman pull IMAGE:TAG` |
|
||||
| `no such network` | Network missing | `podman network create archy-net` |
|
||||
|
||||
## Container Starts But App Unreachable
|
||||
|
||||
| Symptom | Check Layer | Fix |
|
||||
|---------|------------|-----|
|
||||
| Direct port works, /app/ doesn't | Nginx config | Add `/app/{id}/` location block |
|
||||
| Neither works | Podman ports | `podman port NAME` — verify mapping exists |
|
||||
| Port mapped but refused | Container logs | App crashing internally — check logs |
|
||||
| Works sometimes | Resources | Check OOM kills, CPU, disk space |
|
||||
| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
|
||||
|
||||
## Container Keeps Dying
|
||||
|
||||
| Pattern | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Exits immediately (code 1) | Config error | Check `podman logs NAME` |
|
||||
| Dies after minutes | OOM killed | Increase `--memory` limit |
|
||||
| Dies when dep restarts | No restart policy | Add `--restart unless-stopped` |
|
||||
| Crash loop | Repeated crash | Fix root cause, don't just restart |
|
||||
|
||||
## Network Issues
|
||||
|
||||
| Problem | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` |
|
||||
| Can't reach internet | DNS missing | Add `--dns 1.1.1.1` |
|
||||
| Container-to-container timeout | Different networks | Put both on same network |
|
||||
|
||||
## Capability Reference
|
||||
|
||||
| Capability | Apps That Need It | Failure Mode |
|
||||
|-----------|------------------|-------------|
|
||||
| CHOWN | nextcloud, homeassistant, btcpay, jellyfin, portainer | Can't chown during setup |
|
||||
| SETUID/SETGID | nextcloud, homeassistant, btcpay, jellyfin | Can't switch to service user |
|
||||
| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
|
||||
| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
|
||||
| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
|
||||
|
||||
## Read-Only Safe Apps
|
||||
|
||||
Only these 8 apps can run with `--read-only`: searxng, grafana, filebrowser, electrs, nostr-rs-relay, ollama, indeedhub
|
||||
|
||||
All others need writable root or will fail silently.
|
||||
71
.claude/skills/podman-doctor/references/port-map.md
Normal file
71
.claude/skills/podman-doctor/references/port-map.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# Archipelago Canonical Port Map
|
||||
|
||||
All port assignments across the 4 configuration layers. When adding or debugging an app, every row must be consistent across all columns.
|
||||
|
||||
## Bitcoin Stack
|
||||
|
||||
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|
||||
|-----|-------------|-------------------|---------|------------|-------------|
|
||||
| bitcoin-knots | 8332, 8333 | 8332, 8333 | archy-net | /app/bitcoin-knots/ | 8332→bitcoin-knots |
|
||||
| bitcoin-ui | 8334 | 80 | bridge | /app/bitcoin-ui/ | 8334→bitcoin-knots |
|
||||
| electrs | 50001 | 50001 | archy-net | /app/electrs/ | 50001→electrs |
|
||||
| lnd | 9735, 10009, 8080 | 9735, 10009, 8080 | archy-net | /app/lnd/ | 10009→lnd |
|
||||
| lnd-ui (RTL) | 8081 | 80 | bridge | /app/lnd-ui/ | 8081→lnd |
|
||||
|
||||
## Lightning & Payment
|
||||
|
||||
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|
||||
|-----|-------------|-------------------|---------|------------|-------------|
|
||||
| btcpay-server | 23000 | 49392 | archy-net | /app/btcpay/ | 23000→btcpay-server |
|
||||
| nbxplorer | 24444 | 32838 | archy-net | N/A (internal) | N/A |
|
||||
| fedimint | 8173, 8174, 8175 | 8173, 8174, 8175 | archy-net | /app/fedimint/ | 8174→fedimint |
|
||||
| fedimint-gateway | 8175 | 8175 | archy-net | /app/fedimint-gateway/ | 8175→fedimint-gateway |
|
||||
|
||||
## Explorer & Monitoring
|
||||
|
||||
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|
||||
|-----|-------------|-------------------|---------|------------|-------------|
|
||||
| mempool | 4080 | 8080 | archy-net | /app/mempool/ | 4080→mempool |
|
||||
| grafana | 3000 | 3000 | bridge | /app/grafana/ | 3000→grafana (new tab) |
|
||||
|
||||
## Self-Hosted Apps
|
||||
|
||||
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|
||||
|-----|-------------|-------------------|---------|------------|-------------|
|
||||
| nextcloud | 8085 | 80 | bridge | /app/nextcloud/ | 8085→nextcloud |
|
||||
| vaultwarden | 8082 | 80 | bridge | /app/vaultwarden/ | 8082→vaultwarden (new tab) |
|
||||
| filebrowser | 8083 | 80 | bridge | /app/filebrowser/ | 8083→filebrowser |
|
||||
| searxng | 8888 | 8080 | bridge | /app/searxng/ | 8888→searxng |
|
||||
| photoprism | 2342 | 2342 | bridge | /app/photoprism/ | 2342→photoprism (new tab) |
|
||||
| jellyfin | 8096 | 8096 | bridge | /app/jellyfin/ | 8096→jellyfin |
|
||||
| homeassistant | 8123 | 8123 | bridge | /app/homeassistant/ | 8123→homeassistant (new tab) |
|
||||
| ollama | 11434 | 11434 | archy-net | /app/ollama/ | 11434→ollama |
|
||||
| open-webui | 3080 | 8080 | archy-net | /app/open-webui/ | 3080→open-webui |
|
||||
|
||||
## Nostr & Social
|
||||
|
||||
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|
||||
|-----|-------------|-------------------|---------|------------|-------------|
|
||||
| nostr-rs-relay | 7000 | 8080 | archy-net | /app/nostr-rs-relay/ | 7000→nostr-rs-relay |
|
||||
| indeedhub | 3001 | 3000 | archy-net | /app/indeedhub/ | 3001→indeedhub |
|
||||
|
||||
## System
|
||||
|
||||
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|
||||
|-----|-------------|-------------------|---------|------------|-------------|
|
||||
| tailscale | 8240 | 8240 | host | /app/tailscale/ | N/A |
|
||||
| nginx-proxy-manager | 81, 8443 | 81, 443 | bridge | N/A | 81→nginx-proxy-manager |
|
||||
|
||||
## Multi-Container Stacks
|
||||
|
||||
**Immich**: immich-server (2283), immich-postgres (internal 5432), immich-redis (internal 6379) — all on immich-net
|
||||
**Penpot**: penpot-frontend (9001→80), penpot-backend, penpot-exporter, penpot-postgres, penpot-mailcatch — all on penpot-net
|
||||
**Mempool**: mempool (4080→8080), mempool-db (internal 3306) — on archy-net
|
||||
**BTCPay**: btcpay-server (23000→49392), nbxplorer (24444→32838), btcpay-postgres (internal 5432) — on archy-net
|
||||
|
||||
## Key Notes
|
||||
|
||||
- **archy-net apps** resolve each other by container name (e.g., `bitcoin-knots:8332`)
|
||||
- **bridge apps** are standalone — access services via host IP/port
|
||||
- **host network** (tailscale only) — shares host namespace, no port mapping
|
||||
- **New tab apps**: btcpay (23000), grafana (3000), vaultwarden (8082), photoprism (2342), homeassistant (8123) — X-Frame-Options blocks iframe
|
||||
219
.claude/skills/podman-fix/SKILL.md
Normal file
219
.claude/skills/podman-fix/SKILL.md
Normal file
@@ -0,0 +1,219 @@
|
||||
---
|
||||
name: podman-fix
|
||||
description: >
|
||||
Fix Podman container issues on Archipelago — restart failed containers, repair port bindings,
|
||||
fix network connectivity, add missing restart policies, and resolve config drift.
|
||||
Use when asked to "fix container", "restart app", "fix port mapping", "container not working",
|
||||
"app won't start", "fix podman", "repair container", "container down", or after /podman-doctor
|
||||
identifies issues to fix.
|
||||
allowed-tools: Bash Read Edit Write Glob Grep
|
||||
---
|
||||
|
||||
# Podman Fix — Container Remediation
|
||||
|
||||
Targeted fix workflow for Podman container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it.
|
||||
|
||||
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
||||
|
||||
If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing.
|
||||
|
||||
## Fix Procedures
|
||||
|
||||
### Fix 1: Container Not Running
|
||||
|
||||
```bash
|
||||
# Check why it stopped
|
||||
sudo podman logs --tail 50 CONTAINER_NAME
|
||||
sudo podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}"
|
||||
|
||||
# If clean exit or crash — just restart
|
||||
sudo podman start CONTAINER_NAME
|
||||
|
||||
# If corrupt state — remove and recreate
|
||||
sudo podman rm -f CONTAINER_NAME
|
||||
# Then recreate using the install flow (trigger from UI or re-run creation command)
|
||||
```
|
||||
|
||||
**If container keeps crashing**: check logs for the actual error. Common causes:
|
||||
- Missing config file → check if volume mount has the config
|
||||
- Wrong permissions → `chown -R` the data directory
|
||||
- Dependency not ready → start dependency first, wait, then start this container
|
||||
|
||||
### Fix 2: Missing Restart Policy
|
||||
|
||||
The most common uptime killer. Fix for ALL containers at once:
|
||||
|
||||
```bash
|
||||
# Fix a single container
|
||||
sudo podman update --restart unless-stopped CONTAINER_NAME
|
||||
|
||||
# Fix ALL containers that have no restart policy
|
||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
||||
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
||||
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
|
||||
echo "Fixing restart policy for: $c"
|
||||
sudo podman update --restart unless-stopped "$c"
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
**Also update the Rust source** so new installs get it right:
|
||||
- Check `core/archipelago/src/api/rpc/package.rs` `get_app_config()` for the app
|
||||
- Ensure `--restart` flag is in the podman run args
|
||||
|
||||
### Fix 3: Port Mapping Issues
|
||||
|
||||
#### Port conflict (address already in use)
|
||||
```bash
|
||||
# Find what's using the port
|
||||
sudo ss -tlnp | grep :PORT_NUMBER
|
||||
|
||||
# If it's another container, either change one's port or stop the conflicting one
|
||||
sudo podman stop CONFLICTING_CONTAINER
|
||||
|
||||
# If it's a host process
|
||||
sudo kill PID # or stop the service
|
||||
```
|
||||
|
||||
#### Port not mapped (container running but port unreachable)
|
||||
```bash
|
||||
# Check current port mappings
|
||||
sudo podman port CONTAINER_NAME
|
||||
|
||||
# Can't add ports to running container — must recreate
|
||||
sudo podman stop CONTAINER_NAME
|
||||
sudo podman rm CONTAINER_NAME
|
||||
# Recreate with correct -p flags (use the Rust install flow or manual podman run)
|
||||
```
|
||||
|
||||
#### Nginx proxy missing or wrong
|
||||
Read and fix the nginx config:
|
||||
- HTTP: `image-recipe/configs/nginx-archipelago.conf`
|
||||
- HTTPS: `image-recipe/configs/snippets/archipelago-https-app-proxies.conf`
|
||||
|
||||
Add a location block:
|
||||
```nginx
|
||||
location /app/APP_ID/ {
|
||||
proxy_pass http://127.0.0.1:HOST_PORT/;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection $connection_upgrade;
|
||||
# Hide X-Frame-Options so it works in our iframe
|
||||
proxy_hide_header X-Frame-Options;
|
||||
proxy_hide_header Content-Security-Policy;
|
||||
}
|
||||
```
|
||||
|
||||
After editing nginx config, deploy and reload:
|
||||
```bash
|
||||
# On server
|
||||
sudo nginx -t && sudo systemctl reload nginx
|
||||
```
|
||||
|
||||
#### Frontend routing missing
|
||||
Edit `neode-ui/src/stores/appLauncher.ts`:
|
||||
- Add entry to `PORT_TO_APP_ID` map
|
||||
- If app blocks iframes, add port to the new-tab list in `resolveAppIdFromUrl()`
|
||||
|
||||
### Fix 4: Network Issues
|
||||
|
||||
#### Container not on archy-net (can't resolve other containers)
|
||||
```bash
|
||||
# Connect to archy-net without recreating
|
||||
sudo podman network connect archy-net CONTAINER_NAME
|
||||
|
||||
# Verify
|
||||
sudo podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}"
|
||||
```
|
||||
|
||||
#### archy-net doesn't exist
|
||||
```bash
|
||||
sudo podman network create archy-net
|
||||
# Then reconnect all containers that need it
|
||||
```
|
||||
|
||||
#### DNS not working inside container
|
||||
```bash
|
||||
# Test DNS from inside container
|
||||
sudo podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \
|
||||
sudo podman exec CONTAINER_NAME ping -c1 bitcoin-knots
|
||||
|
||||
# If DNS fails, recreate container with explicit DNS
|
||||
# Add --dns 1.1.1.1 to the podman run command
|
||||
```
|
||||
|
||||
### Fix 5: Health Check Issues
|
||||
|
||||
#### Add missing health check to running container
|
||||
Can't add to running container — must recreate with health check flags:
|
||||
```bash
|
||||
# Example for a web app
|
||||
sudo podman run ... \
|
||||
--health-cmd "curl -f http://localhost:PORT/health || exit 1" \
|
||||
--health-interval 30s \
|
||||
--health-timeout 5s \
|
||||
--health-retries 3 \
|
||||
--health-start-period 60s \
|
||||
IMAGE
|
||||
```
|
||||
|
||||
#### Fix unhealthy container
|
||||
```bash
|
||||
# See what the health check is actually running
|
||||
sudo podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}"
|
||||
|
||||
# Run the health check manually to see the error
|
||||
sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
|
||||
|
||||
# Common fixes:
|
||||
# - curl not installed in container → use wget or nc instead
|
||||
# - Wrong port in health check → fix the check command
|
||||
# - App takes too long to start → increase --health-start-period
|
||||
```
|
||||
|
||||
### Fix 6: Permission/Capability Issues
|
||||
|
||||
```bash
|
||||
# Check what capabilities container has
|
||||
sudo podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}"
|
||||
|
||||
# If missing required caps, must recreate with correct --cap-add flags
|
||||
# Refer to the capability reference in /podman-doctor references
|
||||
|
||||
# Fix data directory permissions
|
||||
sudo chown -R 1000:1000 /var/lib/archipelago/APP_NAME/
|
||||
```
|
||||
|
||||
### Fix 7: Full Config Consistency Fix
|
||||
|
||||
When port map is inconsistent across layers, fix ALL layers:
|
||||
|
||||
1. **Decide the correct port** (usually what's in package.rs)
|
||||
2. **Fix Podman**: recreate container with correct `-p` flags
|
||||
3. **Fix Nginx**: update location block's `proxy_pass` port
|
||||
4. **Fix Frontend**: update `PORT_TO_APP_ID` in appLauncher.ts
|
||||
5. **Deploy**: `./scripts/deploy-to-target.sh --live`
|
||||
6. **Verify**: `curl -I http://192.168.1.228/app/APP_ID/`
|
||||
|
||||
## After Fixing
|
||||
|
||||
Always verify the fix:
|
||||
```bash
|
||||
# Container running?
|
||||
sudo podman ps --filter name=CONTAINER_NAME
|
||||
|
||||
# Port reachable?
|
||||
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
|
||||
|
||||
# Via nginx proxy?
|
||||
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/
|
||||
|
||||
# Health check passing?
|
||||
sudo podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}"
|
||||
```
|
||||
|
||||
Run `/podman-doctor` again to confirm all issues are resolved.
|
||||
309
.claude/skills/podman-uptime/SKILL.md
Normal file
309
.claude/skills/podman-uptime/SKILL.md
Normal file
@@ -0,0 +1,309 @@
|
||||
---
|
||||
name: podman-uptime
|
||||
description: >
|
||||
Ensure 100% container uptime on Archipelago. Sets up systemd watchdog timers, verifies
|
||||
restart policies, creates health check monitors, and configures auto-recovery for all
|
||||
containers. Use when asked to "ensure uptime", "containers keep dying", "auto-restart",
|
||||
"watchdog", "container monitoring", "uptime guarantee", "keep containers running",
|
||||
"survive reboot", or to harden container reliability.
|
||||
allowed-tools: Bash Read Edit Write Glob Grep
|
||||
---
|
||||
|
||||
# Podman Uptime — Container Reliability Guardian
|
||||
|
||||
Ensures every Archipelago container survives reboots, recovers from crashes, and stays healthy. Sets up the three layers of uptime defense: restart policies, systemd watchdog, and health-based auto-recovery.
|
||||
|
||||
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
||||
|
||||
## Layer 1: Restart Policies (Survive Reboots)
|
||||
|
||||
Every container MUST have `--restart unless-stopped`. This is non-negotiable.
|
||||
|
||||
### Audit and fix all containers
|
||||
|
||||
```bash
|
||||
# Audit
|
||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
||||
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
||||
echo "$c: $policy"
|
||||
done
|
||||
|
||||
# Fix any with "no" or empty policy
|
||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
||||
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
||||
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
|
||||
echo "Fixing: $c"
|
||||
sudo podman update --restart unless-stopped "$c"
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
### Ensure podman auto-starts containers on boot
|
||||
|
||||
```bash
|
||||
# Enable podman-restart service (restarts containers with restart policy on boot)
|
||||
sudo systemctl enable podman-restart.service 2>/dev/null || true
|
||||
|
||||
# If podman-restart doesn't exist, create it
|
||||
cat <<'EOF' | sudo tee /etc/systemd/system/podman-restart.service
|
||||
[Unit]
|
||||
Description=Podman Start All Containers With Restart Policy
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/bin/podman start --all --filter restart-policy=unless-stopped
|
||||
RemainAfterExit=yes
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable podman-restart.service
|
||||
```
|
||||
|
||||
## Layer 2: Systemd Watchdog (Detect and Recover)
|
||||
|
||||
Create a systemd timer that checks container health every 2 minutes and restarts unhealthy or stopped containers.
|
||||
|
||||
### Create the watchdog script
|
||||
|
||||
```bash
|
||||
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-container-watchdog.sh
|
||||
#!/bin/bash
|
||||
# Archipelago Container Watchdog
|
||||
# Checks all containers and restarts any that are stopped or unhealthy
|
||||
|
||||
LOG_TAG="container-watchdog"
|
||||
|
||||
# Restart any stopped containers that should be running (have restart policy)
|
||||
for c in $(sudo podman ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}"); do
|
||||
logger -t "$LOG_TAG" "Restarting stopped container: $c"
|
||||
sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG"
|
||||
done
|
||||
|
||||
# Restart unhealthy containers
|
||||
for c in $(sudo podman ps --filter health=unhealthy --format "{{.Names}}"); do
|
||||
logger -t "$LOG_TAG" "Restarting unhealthy container: $c"
|
||||
sudo podman restart "$c" 2>&1 | logger -t "$LOG_TAG"
|
||||
done
|
||||
|
||||
# Check for containers in "created" state (never started)
|
||||
for c in $(sudo podman ps -a --filter status=created --format "{{.Names}}"); do
|
||||
logger -t "$LOG_TAG" "Starting created container: $c"
|
||||
sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG"
|
||||
done
|
||||
SCRIPT
|
||||
|
||||
sudo chmod +x /usr/local/bin/archipelago-container-watchdog.sh
|
||||
```
|
||||
|
||||
### Create the systemd timer
|
||||
|
||||
```bash
|
||||
# Service unit
|
||||
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.service
|
||||
[Unit]
|
||||
Description=Archipelago Container Watchdog
|
||||
After=podman-restart.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/bin/archipelago-container-watchdog.sh
|
||||
EOF
|
||||
|
||||
# Timer unit — runs every 2 minutes
|
||||
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.timer
|
||||
[Unit]
|
||||
Description=Run Archipelago Container Watchdog every 2 minutes
|
||||
|
||||
[Timer]
|
||||
OnBootSec=120
|
||||
OnUnitActiveSec=120
|
||||
AccuracySec=30
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now archipelago-watchdog.timer
|
||||
```
|
||||
|
||||
### Verify watchdog is running
|
||||
|
||||
```bash
|
||||
sudo systemctl status archipelago-watchdog.timer
|
||||
sudo systemctl list-timers | grep archipelago
|
||||
# Check watchdog logs
|
||||
sudo journalctl -t container-watchdog --since "1 hour ago" --no-pager
|
||||
```
|
||||
|
||||
## Layer 3: Dependency-Aware Startup Order
|
||||
|
||||
Some containers depend on others. The watchdog handles restarts, but initial boot order matters.
|
||||
|
||||
### Create ordered startup script
|
||||
|
||||
```bash
|
||||
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-ordered-start.sh
|
||||
#!/bin/bash
|
||||
# Ordered container startup for Archipelago
|
||||
# Respects dependency chain: bitcoin → electrs/lnd → mempool/btcpay
|
||||
|
||||
LOG_TAG="ordered-start"
|
||||
|
||||
wait_for_container() {
|
||||
local name=$1
|
||||
local max_wait=${2:-60}
|
||||
local waited=0
|
||||
while [ $waited -lt $max_wait ]; do
|
||||
status=$(sudo podman inspect "$name" --format "{{.State.Running}}" 2>/dev/null)
|
||||
if [ "$status" = "true" ]; then
|
||||
logger -t "$LOG_TAG" "$name is running"
|
||||
return 0
|
||||
fi
|
||||
sleep 5
|
||||
waited=$((waited + 5))
|
||||
done
|
||||
logger -t "$LOG_TAG" "WARNING: $name not running after ${max_wait}s"
|
||||
return 1
|
||||
}
|
||||
|
||||
# Tier 0: Infrastructure
|
||||
logger -t "$LOG_TAG" "Starting Tier 0: Infrastructure"
|
||||
sudo podman start tailscale 2>/dev/null
|
||||
|
||||
# Tier 1: Bitcoin (foundation)
|
||||
logger -t "$LOG_TAG" "Starting Tier 1: Bitcoin"
|
||||
sudo podman start bitcoin-knots 2>/dev/null
|
||||
wait_for_container bitcoin-knots 120
|
||||
|
||||
# Tier 2: Bitcoin-dependent services
|
||||
logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin-dependent"
|
||||
sudo podman start electrs 2>/dev/null
|
||||
sudo podman start lnd 2>/dev/null
|
||||
wait_for_container electrs 90
|
||||
wait_for_container lnd 90
|
||||
|
||||
# Tier 3: Services depending on Tier 2
|
||||
logger -t "$LOG_TAG" "Starting Tier 3: Second-order dependencies"
|
||||
sudo podman start mempool-db 2>/dev/null
|
||||
sleep 5
|
||||
sudo podman start mempool 2>/dev/null
|
||||
sudo podman start nbxplorer 2>/dev/null
|
||||
sleep 10
|
||||
sudo podman start btcpay-server 2>/dev/null
|
||||
sudo podman start btcpay-postgres 2>/dev/null
|
||||
|
||||
# Tier 4: Independent apps (start all remaining)
|
||||
logger -t "$LOG_TAG" "Starting Tier 4: Independent apps"
|
||||
sudo podman start --all 2>/dev/null
|
||||
|
||||
# Tier 5: UI containers (need parent apps running first)
|
||||
logger -t "$LOG_TAG" "Starting Tier 5: UI containers"
|
||||
sudo podman start bitcoin-ui 2>/dev/null
|
||||
sudo podman start lnd-ui 2>/dev/null
|
||||
|
||||
logger -t "$LOG_TAG" "Startup sequence complete"
|
||||
SCRIPT
|
||||
|
||||
sudo chmod +x /usr/local/bin/archipelago-ordered-start.sh
|
||||
```
|
||||
|
||||
### Wire into boot sequence
|
||||
|
||||
```bash
|
||||
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-containers.service
|
||||
[Unit]
|
||||
Description=Archipelago Ordered Container Startup
|
||||
After=network-online.target podman.service
|
||||
Wants=network-online.target
|
||||
Before=archipelago.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/bin/archipelago-ordered-start.sh
|
||||
RemainAfterExit=yes
|
||||
TimeoutStartSec=300
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable archipelago-containers.service
|
||||
```
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
After setting up all 3 layers, verify:
|
||||
|
||||
```bash
|
||||
echo "=== Layer 1: Restart Policies ==="
|
||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
||||
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
||||
echo " $c: $policy"
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "=== Layer 2: Watchdog Timer ==="
|
||||
sudo systemctl is-active archipelago-watchdog.timer
|
||||
sudo systemctl list-timers | grep archipelago
|
||||
|
||||
echo ""
|
||||
echo "=== Layer 3: Boot Services ==="
|
||||
sudo systemctl is-enabled podman-restart.service 2>/dev/null || echo "podman-restart: not found"
|
||||
sudo systemctl is-enabled archipelago-containers.service 2>/dev/null || echo "ordered-start: not found"
|
||||
sudo systemctl is-enabled archipelago-watchdog.timer 2>/dev/null || echo "watchdog: not found"
|
||||
|
||||
echo ""
|
||||
echo "=== Container Health Summary ==="
|
||||
total=$(sudo podman ps -a --format "{{.Names}}" | wc -l)
|
||||
running=$(sudo podman ps --format "{{.Names}}" | wc -l)
|
||||
stopped=$((total - running))
|
||||
unhealthy=$(sudo podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l)
|
||||
echo " Total: $total | Running: $running | Stopped: $stopped | Unhealthy: $unhealthy"
|
||||
```
|
||||
|
||||
## Reboot Test
|
||||
|
||||
The ultimate uptime test — reboot the server and verify everything comes back:
|
||||
|
||||
```bash
|
||||
# Before reboot: record running containers
|
||||
sudo podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt
|
||||
|
||||
# Reboot
|
||||
sudo reboot
|
||||
|
||||
# After reboot (wait ~3 minutes, then SSH back in):
|
||||
sudo podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt
|
||||
|
||||
# Compare
|
||||
diff /tmp/before-reboot.txt /tmp/after-reboot.txt
|
||||
# Should show no differences
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
Check uptime status anytime:
|
||||
```bash
|
||||
# Quick status
|
||||
sudo podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort
|
||||
|
||||
# Watchdog activity
|
||||
sudo journalctl -t container-watchdog --since "24 hours ago" --no-pager
|
||||
|
||||
# Container events (starts, stops, deaths)
|
||||
sudo podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
- Run `/podman-doctor` first to identify issues
|
||||
- Run `/podman-fix` for specific container repairs
|
||||
- Run `/podman-uptime` to set up permanent reliability infrastructure
|
||||
- Add to ISO build: copy watchdog scripts to `image-recipe/configs/` and enable in first-boot
|
||||
Reference in New Issue
Block a user