LLM Gateway
The Pai LLM Gateway lets developers outside the cluster route their LLM requests through Pai. This gives teams centralized API key management, per-user token budgets, model access control, and audit logging — without distributing API keys to individual machines.
This is the companion to the MCP Gateway: same token type
(AccessKey), same pai gateway … CLI pattern, different upstream (LLM provider
instead of MCP server).
How it works
Developer laptop Pai cluster LLM Provider
+---------------+ HTTPS +------------------+ HTTPS +------------+
| Claude Code | -----------> | Pai Gateway | ----------> | Anthropic |
| (any LLM | pak_... | - auth check | real key | OpenAI |
| client) | AccessKey | - token budget | from MP | Gemini |
+---------------+ | - audit log | +------------+
+------------------+
- Admin creates a
ModelProviderwithexternalAccess.enabled: true. - Admin (or the developer, via CLI) mints an
AccessKeybound to the ModelProvider. - Developer runs
pai login+eval $(pai gateway env). - Claude Code / any OpenAI- or Anthropic-compatible client routes through Pai.
The developer's machine never sees the real LLM API key.
Setup
1. Create a ModelProvider with external access
apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: anthropic
spec:
provider: anthropic
apiKeySecretRef:
name: anthropic-key
key: api-key
externalAccess:
enabled: true
# maxTokensPerDay: 2000000 # optional: separate budget for external usage
Or via CLI:
pai create model-provider anthropic --provider anthropic --api-key sk-ant-...
2. Mint an AccessKey for the developer
pai access-key create --name alice-laptop \
--model-provider anthropic \
--allowed-cidr 10.0.0.0/8 \
--allowed-model claude-haiku-4-5
The CLI prints the raw pak_... once — store it securely (1Password,
vault, etc.). AccessKey restrictions can only narrow what the ModelProvider
already permits. Rotate with pai access-key rotate alice-laptop.
3. Developer onboarding (3 commands)
# Connect to Pai
pai login https://api.pairun.dev --access-key pak_...
# Configure local environment
eval $(pai gateway env)
# Use Claude Code normally — requests route through Pai
claude
The pai gateway env command outputs:
export ANTHROPIC_BASE_URL=https://api.pairun.dev/ext/v1
export ANTHROPIC_API_KEY=sk-ant-api03-pak-...-AA
For a more detailed setup with available models listed:
pai gateway setup
4. Add to shell profile (optional)
# Add to ~/.zshrc or ~/.bashrc for persistence
pai gateway env >> ~/.zshrc
Gateway endpoints
The external proxy is served under the /ext/v1/ prefix:
| Endpoint | Compatible with |
|---|---|
POST /ext/v1/chat/completions | OpenAI SDK, LangChain, CrewAI |
POST /ext/v1/messages | Anthropic SDK, Claude Code |
POST /ext/v1/messages/count_tokens | Anthropic SDK pre-flight token counting (Anthropic providers only; other providers return 501) |
Authentication: the AccessKey is wrapped in an Anthropic-compatible API key
format (sk-ant-api03-pak-<key>-AA) so Claude Code accepts it without
modification. Clients that accept a raw bearer (OpenAI SDK, LangChain) can pass
the pak_... directly.
Access control
| Control | Where | Effect |
|---|---|---|
spec.externalAccess.enabled | ModelProvider | Must be true for external requests |
spec.externalAccess.maxTokensPerDay | ModelProvider | Separate daily budget across all external usage |
spec.allowedModels / deniedModels | ModelProvider | Which models this provider exposes |
spec.restrictions.allowedModels | AccessKey | Per-key model allowlist (narrows the ModelProvider) |
spec.restrictions.allowedCIDRs | AccessKey | Client IP allowlist |
spec.limits.maxTokensPerDay | AccessKey | Per-key daily token cap |
allowedCIDRsThe gateway trusts X-Forwarded-For only when the direct peer is in the
configured gateway.externalProviderGateway.trustedProxies. Otherwise the peer
address is used. See the External Provider Gateway guide for details.
Token format
The AccessKey is wrapped to look like a valid Anthropic API key:
pak_a7f3b9c2e1d5 -> sk-ant-api03-pak-a7f3b9c2e1d5-AA
The gateway strips the wrapper, looks up the AccessKey by hash, validates the
per-key restrictions, and then uses the ModelProvider's real API key for the
upstream LLM call. The developer never sees the real key.
Audit
All external proxy requests are logged in the gateway audit chain with
workload=ext:<accesskey-name>, making it easy to track per-developer usage.
Token consumption is visible via pai get metrics and on each key:
AccessKey.status.tokensToday.
Cost tracking
The gateway computes per-call USD cost from a built-in price table and exposes it three ways:
- Response header
x-pai-cost-usdon every LLM response. - Prometheus
pai_cost_usd_total{workload,model,kind}wherekindisinput | cached_input | output. pai get agents— aCOSTcolumn showing today's cumulative spend.
Prices are USD per 1M tokens. The table ships with defaults for common Anthropic, OpenAI, and Gemini models; override or extend via the helm chart:
# values.yaml
gateway:
modelPrices:
"claude-sonnet-4-*":
input: 3.00
output: 15.00
cached_input: 0.30
"my-vllm-llama-70b":
input: 0.50
output: 1.00
Keys support a trailing wildcard (claude-sonnet-4-*); exact matches win
over wildcards. Overrides merge on top of the built-ins and are hot-reloaded
from the pai-model-prices ConfigMap every 60s — no gateway restart needed.
When no price is known for a model, tokens still flow and the call is audited;
only the $ figure is omitted.
Daily cost cap
Pair cost tracking with spec.rateLimits.maxCostPerDayUSD
on an Agent to enforce a hard per-agent spend ceiling (HTTP 429 once the cap is
hit).
Reliability: retries + fallbacks
The gateway wraps every upstream call in a retry loop (429, 5xx, connection
errors) with exponential backoff + jitter. When retries exhaust on the primary
model, the ModelProvider.spec.fallbacks chain takes over. See
ModelProvider → Reliability
for full details.
Tracing headers
Every LLM response carries x-pai-call-id, x-pai-model-id,
x-pai-input-tokens, x-pai-output-tokens, x-pai-duration-ms,
x-pai-retries, and — when applicable — x-pai-cost-usd,
x-pai-cached-input-tokens, x-pai-fell-back-from. Clients can correlate
calls without access to gateway logs.
Related
- MCP Gateway — the MCP-side companion to this document.
- External Provider Gateway — unified AccessKey model across LLM, Provider, and MCP surfaces.