Redaction Engine
The AI DLP Proxy uses a Hybrid Redaction Engine to ensure both high performance and high accuracy when sanitizing sensitive data.
How it Works
The redaction process happens in two stages for every request body:
1. Static Redaction (Fast)
The first layer uses FlashText, an algorithm optimized for replacing keywords in a single pass.
- Purpose: Detects known secrets (API keys, specific project codenames, internal tokens).
- Source: Keywords are loaded from
terms.txtor HashiCorp Vault. - Performance: Extremely fast (microseconds), independent of the number of keywords.
2. Machine Learning Redaction (Smart)
The second layer uses Microsoft Presidio (backed by SpaCy en_core_web_lg).
- Purpose: Detects PII (Personally Identifiable Information) that follows patterns or context, such as:
- Names
- Phone Numbers
- Email Addresses
- Credit Card Numbers
- Crypto Wallets
- Performance: Slower than static (milliseconds), but runs asynchronously to minimize impact.
- Configuration:
ml_enabled: Can be toggled off for maximum speed.ml_threshold: Confidence score (0.0 - 1.0) to filter false positives.