Skip to content

Observability

Health Endpoints

Zion exposes three built-in endpoints that bypass routing and upstream forwarding:

EndpointResponsePurpose
GET /healthz200 okLiveness probe (is the process alive?)
GET /readyz200 readyReadiness probe (is the process ready to serve?)
GET /metricsPrometheus text formatMetrics scraping

These endpoints are handled before rate limiting and routing, ensuring they always respond even under load.

Prometheus Metrics

GET /metrics returns counters in Prometheus text exposition format (text/plain; version=0.0.4).

Counter Reference

MetricTypeDescription
zion_requests_totalcounterTotal HTTP requests processed
zion_requests_by_status{class="2xx"}counterSuccessful responses
zion_requests_by_status{class="4xx"}counterClient errors
zion_requests_by_status{class="5xx"}counterServer errors
zion_waf_deniedcounterRequests denied by WAF
zion_rate_limitedcounterRequests denied by rate limiter
zion_cache_hitscounterResponses served from RAM cache
zion_cache_missescounterCache misses (fetched from upstream)
zion_websocket_upgradescounterWebSocket upgrades completed
zion_connections_totalcounterTotal TLS connections accepted
zion_tls_handshake_errorscounterFailed TLS handshakes

All counters are lock-free atomic u64 values. Incrementing a counter costs ~2ns (single fetch_add with Relaxed ordering).

Runtime Resource Gauges

/metrics also exposes process self-introspection gauges, so you can watch the daemon's own footprint live — and catch a slow leak (for example ~1 MB per 1000 connections) by its RSS slope, without restarting under a profiler.

MetricTypeDescription
zion_active_connectionsgaugeCurrently active TLS connections
zion_process_resident_memory_bytesgaugeResident set size of the Zion process, in bytes (Linux /proc/self/status VmRSS; 0 on other platforms)
zion_process_open_fdsgaugeOpen file descriptors held by the process (Linux /proc/self/fd; 0 on other platforms)

The two process_* gauges are sampled from /proc/self once per scrape — the /metrics render is cached for one second, so the two small file reads never run on the hot connection path. The same values are surfaced in /_zion/snapshot.json and the zion top TUI ("rss" / "open fds" rows). They are Linux-only; on macOS/Windows they render as 0 so one dashboard works across hosts. Run zion doctor to confirm the host actually exposes /proc/self/status — a hardened container runtime that masks /proc will report 0 here, and the check warns you up front.

Prometheus Scrape Config

yaml
scrape_configs:
  - job_name: zion
    static_configs:
      - targets: ['zion-host:443']
    scheme: https
    tls_config:
      insecure_skip_verify: true  # if using self-signed certs

Grafana Dashboard Queries

text
# Request rate
rate(zion_requests_total[5m])

# Error rate
rate(zion_requests_by_status{class="5xx"}[5m])

# WAF deny rate
rate(zion_waf_denied[5m])

# Cache hit ratio
zion_cache_hits / (zion_cache_hits + zion_cache_misses)

# TLS handshake failure rate
rate(zion_tls_handshake_errors[5m])

# Memory-leak slope: RSS growth rate over 30m. A sustained positive
# slope under flat request traffic is the silent-leak signal.
deriv(zion_process_resident_memory_bytes[30m])

# File-descriptor leak: open fds climbing without bound (alert if it
# approaches the `zion doctor` fd soft limit).
zion_process_open_fds

X-Request-ID

Every HTTPS response includes an X-Request-ID header for request tracing.

Behavior:

  • If the incoming request contains X-Request-ID, Zion preserves it and echoes it back on the response
  • If absent, Zion generates a unique ID in the format {timestamp_hex}-{counter_hex} (e.g., 191a2b3c4d5e-0042)
  • The ID is forwarded to the upstream in the request headers
  • The same ID is added to the response headers for client correlation

The counter is a global atomic u64, ensuring uniqueness across all concurrent requests.

Structured Logging

Configure log format in [server]:

toml
[server]
log_format = "json"   # or "text" (default)

Text Format (default, development)

config loaded from zion.toml
  route /api/{*rest} -> backend [waf=strict, cache=off]
ZION ONLINE.

JSON Format (production)

json
{"ts":"1712000000","level":"info","event":"config","msg":"loaded from zion.toml"}
{"ts":"1712000000","level":"info","event":"shutdown","msg":"signal received, draining..."}

JSON logs are structured for ingestion by Loki, ELK, Datadog, or any log aggregator. Fields:

FieldDescription
tsUnix timestamp (seconds)
levelinfo, warn, or error
eventEvent category (e.g., config, health, shutdown, tls)
msgHuman-readable message

Upstream Health Monitoring

Zion runs a background health checker that pings all unique upstream URLs every 30 seconds:

  • Sends GET / to each upstream
  • Healthy = 2xx or 3xx response within 5 seconds
  • State transitions (UP -> DOWN, DOWN -> UP) are logged
  • Health state is stored as an atomic boolean per upstream

The health checker uses a separate HTTP client and does not affect the main proxy connection pool.

Released under the MIT License.