Skip to content

Endpoints Reference

LLMProxy supports 15 LLM providers with automatic request/response translation. Endpoints can be declared through config.yaml, .env variables, the admin UI, or registered automatically via local/Tailscale auto-discovery.

Endpoint sources

SourceWhen to usePersistenceVisible in UI tag
config.yaml endpoints: blockProduction defaults, multi-provider routingVersioned[config]
.env LLM_PROXY_ENDPOINT_<NAME>_*Local/custom OpenAI-compatible hosts — no YAML edit.env[env]
POST /api/v1/registry (admin UI)Ad-hoc additions at runtimeendpoints.db + live config[ui]
Auto-discoveryZero-config onboarding for local/Tailscale providersIn-memory only (re-probed at boot)[auto-discovery]

The four sources coexist — auto-discovery never clobbers an explicitly configured endpoint (collisions get a -auto or -<host> suffix so both entries remain visible).

Env-declared endpoints

Declare an OpenAI-compatible endpoint entirely through environment variables:

bash
LLM_PROXY_ENDPOINT_<NAME>_URL=http://host:port/v1       # required
LLM_PROXY_ENDPOINT_<NAME>_KEY=sk-...                    # optional; omit for no-auth
LLM_PROXY_ENDPOINT_<NAME>_MODELS=model-a,model-b        # optional
LLM_PROXY_ENDPOINT_<NAME>_PROVIDER=openai-compatible    # optional; default openai-compatible

<NAME> becomes the endpoint id (lowercased). Several entries can coexist.

Examples

bash
# LM Studio on the LAN, no auth
LLM_PROXY_ENDPOINT_LMSTUDIO_URL=http://192.168.1.50:1234/v1
LLM_PROXY_ENDPOINT_LMSTUDIO_MODELS=llama-3.3-70b,qwen-2.5-coder-32b

# Remote vLLM with an API key
LLM_PROXY_ENDPOINT_VLLM_URL=https://inference.internal.example.com/v1
LLM_PROXY_ENDPOINT_VLLM_KEY=sk-internal-...
LLM_PROXY_ENDPOINT_VLLM_MODELS=mixtral-8x22b

Auto-discovery

At boot the proxy probes four well-known OpenAI-compatible services:

ServiceDefault portProbe pathAdapter
Ollama11434GET /api/tagsollama
LM Studio1234GET /v1/modelsopenai-compatible
vLLM8000GET /v1/modelsopenai-compatible
LiteLLM4000GET /v1/modelsopenai-compatible

Hosts probed:

  • 127.0.0.1 — bare-metal / host-network deployments
  • host.docker.internal — Docker Desktop (macOS/Windows) plus Linux when extra_hosts: host.docker.internal:host-gateway is set (provided in the shipped docker-compose.yml)
  • Anything listed in LLM_PROXY_DISCOVERY_PEERS

LLM_PROXY_DISCOVERY_PEERS

Comma-separated list of remote hosts to probe. Each entry is either a bare host (probes all four standard ports against every signature) or host:port (probes only that port, still matched against every signature so a custom-port Ollama works).

bash
LLM_PROXY_DISCOVERY_PEERS=100.98.112.23,100.66.12.82,100.108.97.78:8000,nas.lan

Accepts IPs, DNS names, and Tailscale addresses. Unresolvable hosts get a single warning and are skipped.

Naming

  • Local hits (loopback / host gateway) register as their bare provider name (ollama, lmstudio, vllm, litellm).
  • Remote peers register as <provider>-<host-with-dashes> (e.g. lmstudio-100-98-112-23), so multiple nodes never collide.
  • If the preferred id is already taken (e.g. config.yaml ships an ollama entry pointing at a stale localhost:11434), the discovered endpoint registers as <provider>-auto and both remain visible — the operator decides which one wins.

Disabling discovery

bash
LLM_PROXY_LOCAL_DISCOVERY=0

…or in config.yaml:

yaml
discovery:
  local_scan: false
  # peers: ["100.98.112.23", "100.108.97.78:8000"]   # equivalent to env var

Supported Providers

ProviderBase URLAuthModels
OpenAIapi.openai.com/v1Bearergpt-4o, gpt-4o-mini, gpt-4.1, o3-mini, embeddings
Anthropicapi.anthropic.com/v1x-api-keyclaude-sonnet-4, claude-haiku-4.5, claude-opus-4
Googlegenerativelanguage.googleapis.comAPI keygemini-2.5-pro, gemini-2.5-flash, embeddings
Azure{resource}.openai.azure.comapi-keygpt-4o, gpt-4o-mini
Ollamalocalhost:11434Nonellama3.3, qwen3, phi-4, gemma3, embeddings
Groqapi.groq.com/openai/v1Bearerllama-3.3-70b, mixtral-8x7b
Togetherapi.together.xyz/v1BearerLlama-3.3-70B, Mixtral-8x7B
Mistralapi.mistral.ai/v1Bearermistral-large, mistral-small, codestral
DeepSeekapi.deepseek.com/v1Bearerdeepseek-chat, deepseek-reasoner
xAIapi.x.ai/v1Bearergrok-3, grok-3-mini
Perplexityapi.perplexity.aiBearersonar-pro, sonar
OpenRouteropenrouter.ai/api/v1BearerAll models via unified API
Fireworksapi.fireworks.ai/inference/v1Bearerllama-v3p3-70b-instruct
SambaNovaapi.sambanova.ai/v1BearerMeta-Llama-3.3-70B-Instruct
OpenAI-CompatibleCustomBearerAny OpenAI-compatible API

Configuration Examples

OpenAI

yaml
endpoints:
  openai:
    provider: "openai"
    base_url: "https://api.openai.com/v1"
    api_key_env: "OPENAI_API_KEY"
    models: ["gpt-4o", "gpt-4o-mini", "text-embedding-3-small"]
    rate_limit: { rpm: 3500, tpm: 60000 }

Anthropic

yaml
  anthropic:
    provider: "anthropic"
    base_url: "https://api.anthropic.com/v1"
    api_key_env: "ANTHROPIC_API_KEY"
    models: ["claude-sonnet-4-20250514", "claude-haiku-4-5-20251001"]
    rate_limit: { rpm: 1000 }

Google

yaml
  google:
    provider: "google"
    base_url: "https://generativelanguage.googleapis.com/v1beta"
    api_key_env: "GOOGLE_API_KEY"
    models: ["gemini-2.5-pro", "gemini-2.5-flash", "text-embedding-004"]

Ollama (Local)

yaml
  ollama:
    provider: "ollama"
    base_url: "http://localhost:11434"
    auth_type: "none"
    models: ["llama3.3", "qwen3", "phi-4", "nomic-embed-text"]

OpenAI-Compatible (Custom)

For any provider with an OpenAI-compatible API:

yaml
  infercom:
    provider: "openai-compatible"
    base_url: "https://api.infercom.ai/v1"
    api_key_env: "INFERCOM_API_KEY"
    models: ["MiniMax-M2.5", "DeepSeek-R1"]

Format Translation

LLMProxy automatically translates between provider formats:

  • OpenAIAnthropic: Messages format, system prompt handling, streaming events
  • OpenAIGoogle: Content parts, role mapping, safety settings
  • OpenAIAzure: Deployment URL construction, API version headers
  • OpenAIOllama: Direct pass-through (Ollama uses OpenAI format)

Multimodal content (images) is also translated:

  • Anthropic: base64 or urlsource format
  • Google: inlineData or fileData format
  • MIME type auto-detection

MIT License