Skip to content

Marketplace Plugins

18 optional plugins using the BasePlugin SDK. All are disabled by default -- enable via manifest.yaml or the SOC UI.

Pre-Flight Ring

Max Tokens Enforcer

Clamps max_tokens to a hard ceiling. Clients cannot exceed it. Optional default injection when the field is absent.

ConfigDefaultDescription
ceiling4096Hard upper bound on max_tokens
inject_defaultfalseInject ceiling when client omits max_tokens
log_clamptrueLog warning on clamp events

System Prompt Enforcer

Injects, prepends, appends, or replaces the system prompt in every request. Clients cannot bypass it.

ConfigDefaultDescription
prompt""The enforced system prompt
mode"prepend"prepend, append, or replace
skip_if_emptyfalseSkip when request has no messages

Smart Budget Guard

Per-session and per-team budget enforcement with SQLite persistence and cost estimation.

ConfigDefaultDescription
session_budget_usd5.0Max spend per session
team_budget_usd100.0Max spend per team/API key
warn_threshold0.8Warning at this % of budget

Agentic Loop Breaker

Detects AI agents stuck in retry loops via SHA-256 prompt hashing with sliding window.

ConfigDefaultDescription
max_repeats3Identical prompts before blocking
window_seconds120Sliding window duration
hash_messages3Trailing messages to fingerprint

Per-Model Rate Limiter

Granular rate limiting per (tenant, model) pair with sliding window counters.

ConfigDefaultDescription
default_rpm60Requests per minute for unlisted models
window_seconds60Sliding window duration

Topic Blocklist

Blocks requests containing forbidden topics via keyword, whole-word, or regex matching.

ConfigDefaultDescription
topics[]Keywords or regex patterns to block
action"block"block, warn, or log
match_mode"keyword"keyword, whole_word, or regex
case_sensitivefalseCase-sensitive matching
scan_roles["user"]Message roles to scan

Prompt Complexity Scorer

Scores prompt complexity (0-1) on 4 signals for intelligent model routing.

ConfigDefaultDescription
depth_weight0.3Weight for token depth signal
turns_weight0.2Weight for conversation turn count
code_weight0.25Weight for code block density
instruction_weight0.25Weight for instruction density

Model Downgrader

Automatically downgrades expensive models for simple prompts (10-20x cost savings). Works with the Complexity Scorer.

ConfigDefaultDescription
complexity_threshold0.3Downgrade when complexity is below this score

Tool Guard

Strips or blocks restricted tools/functions from agentic AI requests based on user RBAC roles. Prevents tool injection attacks in autonomous agent workflows.

ConfigDefaultDescription
restricted_tools[]Tool/function names that require admin role
action"strip"strip (remove silently) or block (reject request)
admin_roles["admin"]Roles allowed to use restricted tools

Context Window Guard

Blocks requests exceeding the target model's context window (returns clear 413 instead of cryptic upstream 400).

ConfigDefaultDescription
safety_margin0.9Block at this fraction of context window

Routing Ring

A/B Model Router

Routes a configurable percentage of traffic to a variant model for live A/B experimentation. Supports sticky sessions.

ConfigDefaultDescription
control_model"gpt-4o"Primary model
variant_model"gpt-4o-mini"Model under test
split_pct0.1Fraction routed to variant
stickytruePin sessions to the same arm
experiment_id"ab_test"Tag for audit log tracking

Tenant QoS Router

Routes requests to different models based on user/tenant tier. Free-tier users get redirected to cheaper models, premium users get the model they requested. SaaS B2B cost control.

ConfigDefaultDescription
tier_mapping{free: gpt-4o-mini, premium: ""}Maps tier name to target model (empty = use requested)
default_tier"free"Tier for users with no explicit mapping
force_downgradetrueAlways downgrade non-premium users

Post-Flight Ring

Response Quality Gate

Detects empty, refused ("I cannot..."), apology-only, and truncated LLM responses.

ConfigDefaultDescription
min_length20Minimum response length (chars)
refusal_threshold2Refusal patterns to flag
check_truncationtrueDetect mid-sentence cutoff

Latency SLA Guard

Measures TTFT and total latency with rolling percentiles, flags SLA violations.

ConfigDefaultDescription
ttft_p95_ms500TTFT P95 target
total_p95_ms3000Total latency P95 target
hard_limit_ms10000Hard SLA breach threshold
window_size500Rolling window size

Canary Detector

Detects system prompt leakage in responses (data exfiltration protection). Optional auto-block mode.

ConfigDefaultDescription
min_leak_chars50Minimum leaked characters to trigger
similarity_threshold0.6Fraction of system prompt found
block_on_leakfalseAuto-block leaked responses

Schema Enforcer

Validates LLM JSON responses against a client-provided JSON schema. Catches semantically invalid responses (missing required fields, wrong types) before they reach the client application. Supports warn (pass through with log) and block (return 422) modes.

ConfigDefaultDescription
action"warn"warn (pass through) or block (return 422)
max_schema_size8192Maximum schema size in bytes

Background Ring

Token Counter

Extracts real token counts from API responses and corrects budget heuristic estimates with actual data.

ConfigDefaultDescription
cost_per_1k_input0.003USD per 1K input tokens
cost_per_1k_output0.015USD per 1K output tokens

Shadow Traffic

Dark launch / A/B model comparison. After the primary response is returned to the user, asynchronously sends the same prompt to a "shadow" model for comparison. Results are stored in SQLite for SOC dashboard analysis. Enables safe model migration evaluation with real production traffic.

ConfigDefaultDescription
shadow_model""Model to send shadow traffic to
shadow_provider""Provider for shadow model (empty = auto-detect)
sample_rate0.05Fraction of requests to shadow (0.0-1.0)
store_responsestruePersist comparison data to SQLite

MIT License