Skip to content

Configuration Reference

The AI DLP Proxy is configured via a config.yaml file located in the root directory.

Structure

yaml
proxy:
  # ... network settings ...
dlp:
  # ... engine settings ...
upstream:
  # ... forwarding settings ...

Proxy Settings

KeyTypeDefaultDescription
portint8080The TCP port where the proxy listens for incoming connections.
hoststring0.0.0.0The interface to bind to. 0.0.0.0 listens on all interfaces.
ssl_bumpbooltrueEnables HTTPS interception. Requires CA cert installation on clients.
metrics_portint9090Port for the Prometheus metrics server.

DLP Settings

KeyTypeDefaultDescription
static_terms_filestringterms.txtPath to the file containing static keywords. Ignored if provider is vault.
ml_enabledbooltrueEnables the ML-based PII detection engine (Presidio).
ml_thresholdfloat0.5Confidence threshold (0.0-1.0). Higher values reduce false positives but may miss some PII.
nlp_modelstringen_core_web_lgSpaCy model to use. Options: en_core_web_lg (accurate), en_core_web_sm (fast).
entitieslistnullList of entities to detect (e.g., ["PERSON", "EMAIL_ADDRESS"]). null detects all supported types.
replacement_tokenstring[REDACTED]The string used to replace sensitive data.
secrets_provider.typestringfileSource of static terms. Options: file, vault.
secrets_provider.vault.urlstring-URL of the Vault server (e.g., http://localhost:8200).
secrets_provider.vault.pathstring-Path to the KV secret (e.g., aidlp/terms).
secrets_provider.vault.tokenstring-Vault token. Recommended: Use VAULT_TOKEN env var instead.

Upstream Settings

KeyTypeDefaultDescription
default_schemestringhttpsDefault protocol for upstream requests if not specified.

Environment Variables

Sensitive configuration can be overridden via environment variables:

  • VAULT_TOKEN: Authentication token for HashiCorp Vault.
  • DOCKERHUB_USERNAME / DOCKERHUB_TOKEN: Used in CI/CD for publishing images.

Full Configuration Example

yaml
# config.yaml
proxy:
  # The port the proxy listens on for incoming traffic
  port: 8080
  # The port for Prometheus metrics
  metrics_port: 9090
  # Enable SSL interception (required for DLP)
  ssl_bump: true

dlp:
  # Path to file containing static sensitive terms (one per line)
  static_terms_file: "terms.txt"

  # Enable Machine Learning based detection
  ml_enabled: true

  # Confidence threshold (0.0 - 1.0)
  # Higher = fewer false positives, potentially more missed PII
  ml_threshold: 0.8

  # NLP Model to use
  # "en_core_web_lg" (Accurate, Slower)
  # "en_core_web_sm" (Fast, Less Accurate)
  nlp_model: "en_core_web_sm"

  # Specific entities to detect. If null, detects all.
  # See Presidio docs for full list.
  entities:
    - "PERSON"
    - "PHONE_NUMBER"
    - "EMAIL_ADDRESS"
    - "CREDIT_CARD"

  # String to replace sensitive data with
  replacement_token: "[REDACTED]"

  # Secrets Provider Configuration
  secrets_provider:
    # "file" or "vault"
    type: "vault"
    vault:
      url: "http://localhost:8200"
      path: "aidlp/terms"
      # Token can also be set via VAULT_TOKEN env var
      # token: "hvs.xxx"