Production deployment checklist
A single page to walk through before pointing proxxx at a real cluster you care about. Each item is a one-line check + a verifiable command. Treat this as the minimum bar — your shop's own runbook may be stricter.
TIP
This page is for the operator deploying proxxx, not for the PVE cluster itself. Cluster hardening (corosync over a private ring, firewall rules, certificate rotation) is upstream Proxmox material; we link to it but don't restate it.
1. Verify the binary
[ ] Download from a tagged release, not main
TARGET=x86_64-unknown-linux-musl # or aarch64-apple-darwin
VERSION=0.1.6 # check the latest at /releases
gh release download v${VERSION} \
--repo fabriziosalmi/proxxx \
--pattern "*-${TARGET}.tar.gz" \
--pattern "*-${TARGET}.tar.gz.sha256"[ ] Check the SHA-256 sidecar
shasum -a 256 -c proxxx-${VERSION}-${TARGET}.tar.gz.sha256
# → proxxx-...tar.gz: OK[ ] Verify the sigstore keyless cosign signature (release ≥ next-tag-after-v0.1.6)
gh release download v${VERSION} \
--repo fabriziosalmi/proxxx \
--pattern "*-${TARGET}.tar.gz.cosign.bundle"
cosign verify-blob \
--bundle proxxx-${VERSION}-${TARGET}.tar.gz.cosign.bundle \
--certificate-identity-regexp 'https://github.com/fabriziosalmi/proxxx/.github/workflows/release.yml@.*' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
proxxx-${VERSION}-${TARGET}.tar.gz
# → Verified OKThe cert-identity-regexp pins the OIDC subject to this exact workflow path in this exact repo — a leaked sigstore cert from any other workflow or any other repo can't validate against these bundles. The transparency-log inclusion proof is embedded in the bundle, so verification is offline.
[ ] Audit the CycloneDX SBOM (optional but recommended)
gh release download v${VERSION} --repo fabriziosalmi/proxxx \
--pattern "*.cdx.json" --pattern "*.cdx.json.sha256"
shasum -a 256 -c proxxx-${VERSION}.cdx.json.sha256
grype sbom:proxxx-${VERSION}.cdx.json # or trivy / cyclonedx-cli2. Configure access
[ ] Use API tokens, not passwords
Tokens are revocable, scopable, and don't carry full account privilege when --privsep=1. Create with:
ssh root@<node>
pveum user token add operator@pve proxxx --privsep=1
# Grants the TOKEN the same role you grant on the user side:
pveum acl modify /vms/100 -tokens 'operator@pve!proxxx' -roles PVEVMAdmin[ ] Pin verify_tls = true unless you know why not
Self-signed labs flip this to false for convenience. In production:
verify_tls = trueIf you're running PVE behind a real cert (Let's Encrypt, internal CA, ACME via PVE itself), this is the only correct setting. Disabling TLS verification exposes the entire API + WebSocket traffic (including serial-console tickets) to any MITM on the path.
[ ] Store the token secret in the OS keychain or a 0600 file, not inline
# Option A: macOS keychain
security add-generic-password \
-a "$USER" -s proxxx -w "<token-uuid>"
# Option B: Linux secret-service (gnome-keyring / kwallet)
secret-tool store --label proxxx service proxxx account token_secret
# (proxxx reads via the `keyring` crate, falling back to
# secret-service on Linux, libsecret on macOS keychain)
# Option C: 0600 file referenced from config.toml
mkdir -p ~/.config/proxxx
printf '%s' '<uuid>' > ~/.config/proxxx/token.secret
chmod 600 ~/.config/proxxx/token.secret
# token_secret_file = "~/.config/proxxx/token.secret" in config.tomlWARNING
proxxx refuses to read token_secret_file if the file is not mode 0600 (Unix). It will print Security Error: token_secret_file '<path>' has unsafe permissions <mode> and exit. This is intentional — don't chmod 644 to "fix" it.
[ ] Validate the connection works as the deploying user
proxxx ls nodes
proxxx ls guests --format json | jq '.[] | {vmid, name, status}'
proxxx perms <user>3. Configure HITL (if any operator runs destructive ops)
[ ] Provision a dedicated Telegram bot
Don't reuse a bot that's also wired to other systems — the HITL daemon polls and acknowledges every callback, and a shared bot's other listeners may double-fire.
# In Telegram: chat with @BotFather
/newbot → name + username → copy the API token
/setprivacy → DISABLE (so the bot sees group messages)[ ] Pin [telegram] in config.toml
[telegram]
bot_token = "<bot-api-token>"
chat_id = "<your-numeric-chat-id>"The bot token resolves with the same hierarchy as the PVE token: PROXXX_TELEGRAM_BOT_TOKEN env, bot_token_file, keychain, inline.
[ ] Configure [[policies]] rules
[[policies]]
when = { action = "delete", tag = "prod" }
require = "telegram-2of3" # or "telegram"
channel = "telegram"
[[policies]]
when = { action = "stop", vmid = "100" }
require = "telegram"
channel = "telegram"Policies match by action, tag, vmid, or wildcard. The deny-on-timeout is hardcoded to 120 s — if the human doesn't approve in that window, the op is rejected (NOT auto-approved).
[ ] Run the HITL daemon under a process supervisor
# systemd unit at /etc/systemd/system/proxxx-hitl.service:
[Unit]
Description=proxxx HITL approval daemon
After=network-online.target
[Service]
Type=simple
User=proxxx-ops
ExecStart=/usr/local/bin/proxxx hitl serve
Restart=on-failure
RestartSec=5
# Replay protection survives single-process restart via
# session-local consumed-txn-id set; no persistence layer
# needed.
[Install]
WantedBy=multi-user.target[ ] Test the round-trip end-to-end
proxxx hitl test --action delete --vmid 999
# → Telegram → tap Approve in <120s → daemon runs the op
# → daemon answers callback "✅ Done" → message lifecycle done4. Configure alerting (optional)
[ ] Define [[alerts]] rules in config.toml
[[alerts]]
name = "node_offline"
when = "node_offline"
for_secs = 120
severity = "critical"
route = ["telegram", "ntfy:proxxx-prod"]
dedup_secs = 600Predicates available: node_offline, storage_above, replication_failing. The dedup_secs window prevents re-fire spam.
[ ] Run the alert daemon under a supervisor
ExecStart=/usr/local/bin/proxxx alerts watch --interval 30The daemon persists its dedup window to SQLite (cache schema 1 → 2 since v0.1.2), so a routine restart doesn't re-fire every active alert. Persistence is local to the daemon's host — no shared state across replicas.
[ ] Test each route once
proxxx alerts test --route 'telegram'
proxxx alerts test --route 'ntfy:proxxx-prod'
proxxx alerts test --route 'webhook:https://hooks.example/notify'5. Configure SSH layer (if running proxxx perms or proxxx patch apply)
[ ] Provision a dedicated SSH key for proxxx
Don't reuse your personal ~/.ssh/id_ed25519. proxxx maintains its own known_hosts at $XDG_CONFIG_HOME/proxxx/known_hosts — giving it a dedicated key keeps the audit trail separate.
ssh-keygen -t ed25519 -f ~/.ssh/proxxx_ops -N "" \
-C "proxxx-ops@$(hostname)"
ssh-copy-id -i ~/.ssh/proxxx_ops.pub root@<each-node>[ ] Configure [ssh] block
[ssh]
user = "root"
key_path = "~/.ssh/proxxx_ops"
strict_host_key_checking = "tofu" # default; pinning happens on first connect[ ] Verify the round-trip
proxxx perms <user> # exercises the same path
# → table of effective ACLs; if you see the table, SSH works.6. Lock down the operator host itself
[ ] Treat ~/.config/proxxx/config.toml as a credential file
chmod 600 ~/.config/proxxx/config.toml
ls -l ~/.config/proxxx/[ ] Confirm shells aren't leaking secrets via history
# Bash:
echo $HISTFILE
grep -E 'PROXXX_TOKEN_SECRET|PROXXX_PBS_TOKEN_SECRET' ~/.bash_history
# Zsh: ~/.zsh_history. Fish: ~/.local/share/fish/fish_history.If you find tokens in history, rotate them on the cluster side before deleting from history.
[ ] Pin a Rust version + build from source for reproducibility (optional)
git clone https://github.com/fabriziosalmi/proxxx.git
cd proxxx
git checkout v${VERSION}
cargo build --release --target x86_64-unknown-linux-musl
# Compare your binary's sha256 against the release sha256.7. Operational runbook
[ ] Pin the build into your fleet inventory
proxxx version --json
# → { "version": "0.1.6", "git_sha_short": "...", "audit_ignores_count": 1, ... }Snapshot this output into your inventory (Ansible facts / Salt grain / Puppet fact) so a security advisory triage can answer "which hosts have <vulnerable version>" instantly.
[ ] Subscribe to release notifications
GitHub repo → Watch → Custom → Releases. Or pin the release feed: https://github.com/fabriziosalmi/proxxx/releases.atom.
[ ] Document your local risk-override policy
--allow-risk bypasses the pre-flight gate. If your shop permits this for any class of op (e.g. patch-apply during a planned window), document who can use it and for what in your runbook. The flag is ungated by design — proxxx trusts the operator who typed --yes AND --allow-risk.
[ ] Test recovery: token revocation
ssh root@<node>
pveum user token remove operator@pve proxxx
# → next proxxx call from the operator host should 401
proxxx ls nodes # expect: HTTP 401 No ticketIf it doesn't 401, the token wasn't actually scoped — re-issue with --privsep=1 and grant the role to the token path.
See also
- Configuration schema — every TOML block by section.
- Security model — threat model + invariants.
- Pre-commit gate — what every release passes before tagging.
- Troubleshooting — error message → fix index.
SECURITY.md— coordinated disclosure contact.