Troubleshooting

Service won't start

Check YAML syntax:

bash

python3 -c "import yaml; yaml.safe_load(open('/etc/lxc_autoscale/lxc_autoscale.yaml'))"

Check for Pydantic validation errors: the daemon validates all configuration at startup. Common errors:
- cpu_lower_threshold must be < cpu_upper_threshold -- thresholds are inverted or equal.
- min_cores must be <= max_cores -- core limits are inverted.
- Input should be 'normal', 'conservative' or 'aggressive' -- invalid behaviour value.
- Input should be 'cli' or 'api' -- invalid backend value.
These errors are printed to stderr and the daemon exits. Fix the YAML values and restart.

Verify Python dependencies:

bash

pip3 install -r /usr/local/bin/lxc_autoscale/requirements.txt

Check service logs:

bash

journalctl -u lxc_autoscale.service -n 50

Scaling not working

Verify containers are running:
bash
```
pct list
```
Check if containers are in the ignore list: review ignore_lxc in the config file.
Check thresholds: ensure the gap between upper and lower thresholds is not too narrow.
First cycle returns 0% CPU: this is expected. The cgroup measurement stores a raw sample on the first cycle and computes the delta on the second cycle. Scaling begins on the second poll interval.
Review the log for messages like "already at max cores" or "not enough available cores on host".

High CPU usage by the daemon

Increase the poll interval (e.g. poll_interval: 600).
Reduce monitored containers by adding non-critical ones to ignore_lxc.

TIP

CPU and memory measurement uses host-side cgroup reads instead of pct exec, which dramatically reduces daemon overhead compared to v1.x.

Permission errors

Verify the service runs as root: check /etc/systemd/system/lxc_autoscale.service.

Check file permissions:

bash

ls -la /etc/lxc_autoscale/
ls -la /var/log/lxc_autoscale.log

Config file permission warning: if the daemon logs "Config file is readable by group/others", run:
bash
```
chmod 600 /etc/lxc_autoscale/lxc_autoscale.yaml
```

Config changes not taking effect

Restart the service after editing the YAML file:

bash

systemctl restart lxc_autoscale.service

Remote SSH execution issues

Test SSH connectivity:

bash

ssh -p <port> <user>@<proxmox_host> "pct list"

Host key verification failure: if you see Server host key not found in known_hosts, add the host key:
bash
```
ssh-keyscan -H <proxmox_host> >> ~/.ssh/known_hosts
```
Verify credentials: check ssh_user, ssh_password (or ssh_key_path), and proxmox_host in the config.
Ensure use_remote_proxmox: true is set.
SSH policy: if ssh_host_key_policy is set to reject (the default), connections to hosts not in known_hosts will be refused. This is the correct behavior. Do not set it to auto in production.

REST API backend issues

proxmoxer not installed:

RuntimeError: proxmoxer is required for the REST API backend.

Fix: pip install proxmoxer

Missing API host:
```
ValueError: proxmox_api.host is required when backend=api
```
Fix: add proxmox_api.host to the configuration.
Authentication failure: verify token_name and token_value match the API token created in the Proxmox UI. Ensure the token has the required permissions (VM.Audit, VM.Config.CPU, VM.Config.Memory).
SSL verification failure: if the Proxmox host uses a self-signed certificate, set proxmox_api.verify_ssl: false. For production, use a valid certificate.
No nodes found:
```
RuntimeError: No Proxmox nodes found via API
```
The API token may lack permissions to list nodes, or the Proxmox host is unreachable.

Notification issues

Notifications not arriving: check the log for errors like "Gotify notification failed" or "Failed to send email".
Notifications backed off: if a channel fails 3 times consecutively, it is suppressed for 10 cycles. Look for "consecutive failures, backing off" in the log. The channel retries automatically after the backoff period.
SMTP timeouts: verify the SMTP server is reachable and the port is correct. Notifications are sent asynchronously, so SMTP delays do not block scaling.

Troubleshooting ​

Service won't start ​

Scaling not working ​

High CPU usage by the daemon ​

Permission errors ​

Config changes not taking effect ​

Remote SSH execution issues ​

REST API backend issues ​

Notification issues ​

Troubleshooting

Service won't start

Scaling not working

High CPU usage by the daemon

Permission errors

Config changes not taking effect

Remote SSH execution issues

REST API backend issues

Notification issues