Troubleshooting
Service won't start
Check YAML syntax:
bashpython3 -c "import yaml; yaml.safe_load(open('/etc/lxc_autoscale/lxc_autoscale.yaml'))"Check for Pydantic validation errors: the daemon validates all configuration at startup. Common errors:
cpu_lower_threshold must be < cpu_upper_threshold-- thresholds are inverted or equal.min_cores must be <= max_cores-- core limits are inverted.Input should be 'normal', 'conservative' or 'aggressive'-- invalidbehaviourvalue.Input should be 'cli' or 'api'-- invalidbackendvalue.
These errors are printed to stderr and the daemon exits. Fix the YAML values and restart.
Verify Python dependencies:
bashpip3 install -r /usr/local/bin/lxc_autoscale/requirements.txtCheck service logs:
bashjournalctl -u lxc_autoscale.service -n 50
Scaling not working
Verify containers are running:
bashpct listCheck if containers are in the ignore list: review
ignore_lxcin the config file.Check thresholds: ensure the gap between upper and lower thresholds is not too narrow.
First cycle returns 0% CPU: this is expected. The cgroup measurement stores a raw sample on the first cycle and computes the delta on the second cycle. Scaling begins on the second poll interval.
Review the log for messages like "already at max cores" or "not enough available cores on host".
High CPU usage by the daemon
Increase the poll interval (e.g.
poll_interval: 600).Reduce monitored containers by adding non-critical ones to
ignore_lxc.
TIP
CPU and memory measurement uses host-side cgroup reads instead of pct exec, which dramatically reduces daemon overhead compared to v1.x.
Permission errors
Verify the service runs as root: check
/etc/systemd/system/lxc_autoscale.service.Check file permissions:
bashls -la /etc/lxc_autoscale/ ls -la /var/log/lxc_autoscale.logConfig file permission warning: if the daemon logs "Config file is readable by group/others", run:
bashchmod 600 /etc/lxc_autoscale/lxc_autoscale.yaml
Config changes not taking effect
Restart the service after editing the YAML file:
systemctl restart lxc_autoscale.serviceRemote SSH execution issues
Test SSH connectivity:
bashssh -p <port> <user>@<proxmox_host> "pct list"Host key verification failure: if you see
Server host key not found in known_hosts, add the host key:bashssh-keyscan -H <proxmox_host> >> ~/.ssh/known_hostsVerify credentials: check
ssh_user,ssh_password(orssh_key_path), andproxmox_hostin the config.Ensure
use_remote_proxmox: trueis set.SSH policy: if
ssh_host_key_policyis set toreject(the default), connections to hosts not inknown_hostswill be refused. This is the correct behavior. Do not set it toautoin production.
REST API backend issues
proxmoxer not installed:
RuntimeError: proxmoxer is required for the REST API backend.Fix:
pip install proxmoxerMissing API host:
ValueError: proxmox_api.host is required when backend=apiFix: add
proxmox_api.hostto the configuration.Authentication failure: verify
token_nameandtoken_valuematch the API token created in the Proxmox UI. Ensure the token has the required permissions (VM.Audit,VM.Config.CPU,VM.Config.Memory).SSL verification failure: if the Proxmox host uses a self-signed certificate, set
proxmox_api.verify_ssl: false. For production, use a valid certificate.No nodes found:
RuntimeError: No Proxmox nodes found via APIThe API token may lack permissions to list nodes, or the Proxmox host is unreachable.
Notification issues
Notifications not arriving: check the log for errors like "Gotify notification failed" or "Failed to send email".
Notifications backed off: if a channel fails 3 times consecutively, it is suppressed for 10 cycles. Look for "consecutive failures, backing off" in the log. The channel retries automatically after the backoff period.
SMTP timeouts: verify the SMTP server is reachable and the port is correct. Notifications are sent asynchronously, so SMTP delays do not block scaling.