Quick Start
Get LLMProxy running in under 60 seconds.
One command
git clone https://github.com/fabriziosalmi/llmproxy && cd llmproxy
./install.shThe installer detects your platform, verifies prerequisites, generates a proxy auth key in .env, and boots the service via Docker Compose v2 (recommended) or a local Python 3.12+ virtualenv. For CI or scripted use, ./install.sh --docker, ./install.sh --local, or ./install.sh --check skip the interactive prompt.
Prerequisites
| Install path | Requirements |
|---|---|
| Docker (recommended) | Docker Engine + Docker Compose v2 plugin (docker compose). Legacy docker-compose v1 is unsupported — it breaks against modern urllib3. On Debian/Ubuntu: sudo apt install docker-compose-plugin. |
| Local venv | Python 3.12+ (Ubuntu 22.04 ships 3.10 — use deadsnakes PPA or the Docker path). |
./install.sh --check reports the exact command needed to satisfy any missing prerequisite before committing to an install mode.
Onboarding mode
The proxy starts even with zero providers configured. Inference requests return 503 until at least one endpoint is active, but the admin UI, health probe, and POST /api/v1/registry are all reachable immediately so you can finish setup from the browser.
On first boot you get four ways to add a provider — pick whichever hurts least:
1. Auto-discovery (zero config)
If Ollama, LM Studio, vLLM, or LiteLLM is already running on the host, the proxy probes 127.0.0.1 and host.docker.internal on the standard ports and registers the responders automatically. No YAML, no env var.
To extend discovery to remote hosts (Tailscale peers, LAN nodes), set:
# in .env
LLM_PROXY_DISCOVERY_PEERS=100.98.112.23,100.66.12.82,100.108.97.78:8000Each entry can be a bare host (probes all four standard ports) or host:port (probes that port only, against every supported protocol signature). Discovered entries get a stable id like lmstudio-100-98-112-23 so multiple peers never collide.
Disable entirely with LLM_PROXY_LOCAL_DISCOVERY=0.
2. Env-declared endpoints (no YAML)
Declare any OpenAI-compatible endpoint directly in .env:
LLM_PROXY_ENDPOINT_LOCAL_URL=http://192.168.1.50:1234/v1
LLM_PROXY_ENDPOINT_LOCAL_MODELS=llama-3.3-70b,qwen-2.5-coder-32b
# LLM_PROXY_ENDPOINT_LOCAL_KEY=sk-… # leave blank for no-auth serversThe env-declared endpoint becomes the local provider on next boot.
3. Cloud provider keys
Set any of the standard keys in .env (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, …) and the matching entry from config.yaml activates. No YAML edits required unless you want to change defaults.
4. Admin UI wizard
Open http://localhost:8090/ui, go to Endpoints, and fill the form — name, URL, provider, optional API key and model list. New entries take effect live without restart.
Boot banner
On startup the proxy prints a summary to stdout — visible in docker compose logs llmproxy or the terminal:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
LLMProxy is ready http://localhost:8090/v1
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Active providers (3):
[config] openai (openai)
gpt-5.4, gpt-5.4-mini
[auto-discovery] ollama-auto (ollama)
llama3.2:3b, qwen2.5-coder:7b
[auto-discovery] lmstudio-100-98-112-23 (openai-compatible)
qwen/qwen2.5-coder-14b, +3 more
WAF: ON (byte-level ASGI injection firewall)
Auth: required Bearer key in $LLM_PROXY_API_KEYS → sk-proxy-a…7890
Smoke test:
curl http://localhost:8090/v1/chat/completions \
-H 'Authorization: Bearer $(grep -oP "^LLM_PROXY_API_KEYS=\K[^,]+" .env | head -1)' \
-H 'Content-Type: application/json' \
-d '{"model":"gpt-5.4-mini","messages":[{"role":"user","content":"hi"}]}'
Admin UI: http://localhost:8090/ui
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━The smoke-test curl uses the first available model, so it runs successfully after copy-paste.
First request
# Your proxy key was generated by install.sh; read it from .env:
export LLMPROXY_KEY=$(grep -oP '^LLM_PROXY_API_KEYS=\K[^,]+' .env | head -1)
curl http://localhost:8090/v1/chat/completions \
-H "Authorization: Bearer $LLMPROXY_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'Model aliases
Shorthand names resolve to real model ids:
curl http://localhost:8090/v1/chat/completions \
-H "Authorization: Bearer $LLMPROXY_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "fast", "messages": [{"role": "user", "content": "Hello!"}]}'Defaults: gpt4 → gpt-4o, claude → claude-sonnet, fast → gpt-4o-mini, cheap → gemini-2.5-flash-lite.
Cross-provider fallback
If OpenAI is down, LLMProxy walks the configured fallback chain automatically:
gpt-4o → claude-sonnet → gemini-2.5-proNo client-side changes needed.
Using with OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8090/v1",
api_key="your-secret-key-1",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Using with Cursor / Continue / OpenWebUI
Point any OpenAI-compatible client to:
Base URL: http://localhost:8090/v1
API Key: (contents of LLM_PROXY_API_KEYS)LLMProxy exposes /v1/models for automatic model discovery.
Disabling the WAF
The byte-level ASGI firewall is enabled by default. Disable via env or config when fronting the proxy with another WAF or debugging a false positive:
LLM_PROXY_FIREWALL_ENABLED=0 # in .env — restart requiredOr in config.yaml:
security:
firewall:
enabled: falseThe admin UI surfaces the live state and the reason it is off (env:… or config:…). The toggle is env/config-only by design — a one-click UI switch would make L1 injection defense trivially removable.
Next steps
- Configuration — Full
config.yamlreference - Endpoints — Provider matrix and env-based endpoint syntax
- Security — Understanding the security pipeline
- Plugins — Enable marketplace plugins
- SOC Dashboard — Real-time monitoring