ModelProvider
A ModelProvider represents a single LLM API subscription (Anthropic, OpenAI, Google, etc.). One resource covers all models from that provider — no need to create a separate resource per model.
Agents reference models as provider/model-id:
spec:
models:
- anthropic/claude-sonnet-4-6
- google/gemini-2.5-flash
Quick start
# Create a ModelProvider (the API key is stored securely for you)
pai create model-provider anthropic --provider anthropic --api-key sk-ant-...
# Verify
pai get model-providers
# NAME PROVIDER MAX/DAY READY AGE
# anthropic anthropic true 5s
Full spec
apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: anthropic
spec:
provider: anthropic # anthropic | openai | gemini | openrouter | azure-openai | vllm
apiKeySecretRef:
name: anthropic-key
key: api-key
endpoint: "" # custom URL (required for azure-openai, vllm)
apiVersion: "" # API version query param (required for azure-openai)
allowedModels: [] # empty = all models allowed
deniedModels: [] # takes precedence over allowedModels
maxTokensPerDay: 5000000 # provider-wide budget across all agents
maxTokensPerRequest: 200000 # per-request context window limit
retry: # transient-error retry policy (default 3 attempts)
maxAttempts: 3
initialBackoffMs: 200
maxBackoffMs: 5000
fallbacks: # tried after retries exhaust
- anthropic/claude-haiku-4-5
guards: # org-wide baseline — applies to ALL traffic
- binding: prompt-guard-default
scan:
prompts: true
enforcement: enforce
externalAccess:
enabled: false # expose to developers outside Pai (laptops, CI)
maxTokensPerDay: 2000000 # separate budget for external usage
Fields
| Field | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | LLM provider backend: anthropic, openai, gemini, openrouter, azure-openai, vllm |
apiKeySecretRef | object | Yes | {name, key} — reference to a Secret holding the API key |
endpoint | string | No | Custom API endpoint URL (required for azure-openai and vllm). For azure-openai, the resource URL (e.g. https://myresource.openai.azure.com). |
apiVersion | string | No | API version query parameter. Required for azure-openai (e.g. 2024-06-01). Ignored by other providers. |
allowedModels | string[] | No | Model IDs allowed. Empty = all allowed. For azure-openai, these are deployment names. |
deniedModels | string[] | No | Model IDs denied. Takes precedence over allowedModels. |
maxTokensPerDay | integer | No | Hard daily budget across ALL agents using this provider |
maxTokensPerRequest | integer | No | Max tokens per single request |
retry.maxAttempts | integer | No | Total attempts including the first try. Default 3, clamped to [1,10]. |
retry.initialBackoffMs | integer | No | Backoff before the second attempt. Doubles each retry (with ±25% jitter). Default 200. |
retry.maxBackoffMs | integer | No | Upper cap on the backoff. Default 5000. |
fallbacks | string[] | No | Ordered provider/model-id references tried when retries on the primary exhaust. |
guards | object[] | No | Provider-wide LLM guards (prompt injection / jailbreak scanning) |
guards[].binding | string | Yes | GuardBinding name |
guards[].scan | object | No | What to scan: {prompts, responses, toolResults} |
guards[].enforcement | string | No | Override: enforce or audit |
externalAccess.enabled | boolean | No | Expose to external developers via the LLM Gateway |
externalAccess.maxTokensPerDay | integer | No | Separate daily budget for external usage |
Status
| Field | Description |
|---|---|
status.ready | true when the referenced Secret exists and spec is valid |
status.message | Error details when ready is false |
status.tokensToday | Total tokens consumed today across all agents |
status.externalUrl | External proxy URL (when externalAccess is enabled) |
Two-layer guard model
Guards can be set at two levels:
| Layer | Scope | Use case |
|---|---|---|
| ModelProvider guards | All traffic through this provider | Org-wide baseline — "no prompt injection ever reaches Anthropic" |
| Agent guards | Per-agent | Tighter rules (e.g., also scan tool results for a specific agent) |
Both layers run on every request. ModelProvider guards execute first. For
external gateway requests (/ext/), ModelProvider guards are the only layer
since there is no agent.
spec:
guards:
- binding: prompt-guard-default
scan:
prompts: true
enforcement: enforce # block prompt injection at the provider level
Azure OpenAI
Azure OpenAI requires two extra fields: endpoint (the resource URL) and
apiVersion. Model IDs are your Azure deployment names, not the
upstream model name.
apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: azure
spec:
provider: azure-openai
endpoint: https://myresource.openai.azure.com
apiVersion: "2024-06-01"
apiKeySecretRef:
name: azure-openai-key
key: api-key
allowedModels: [gpt-4o-deployment, gpt-4o-mini-deployment]
Agents reference deployments through the provider prefix:
spec:
models:
- azure/gpt-4o-deployment
The gateway rewrites the request to
{endpoint}/openai/deployments/{deployment}/chat/completions?api-version={apiVersion}
and injects the api-key header automatically.
Reliability: retries + fallbacks
The gateway retries transient failures and falls back to alternate models when the primary provider is exhausted.
Retries
Every upstream call is wrapped in a retry loop that reacts to 408, 425,
429, 500, 502, 503, 504, and to connection / read / timeout
errors. Backoff is exponential (initialBackoffMs × 2^attempt, capped at
maxBackoffMs) with ±25% jitter.
spec:
retry:
maxAttempts: 5 # try up to 5 times total
initialBackoffMs: 500 # 500ms, then 1000ms, then 2000ms...
maxBackoffMs: 10000 # ...capped at 10s
Defaults are maxAttempts: 3, initialBackoffMs: 200, maxBackoffMs: 5000.
Set maxAttempts: 1 to disable retries entirely.
Fallbacks
When retries on the primary model exhaust, the gateway walks the
fallbacks list in order. Each entry is a full provider/model-id
reference that must resolve to another ModelProvider (cross-provider
fallback is allowed — Anthropic → OpenAI → Gemini is a valid chain).
spec:
fallbacks:
- anthropic/claude-haiku-4-5 # cheaper sibling first
- openai/gpt-4o # then cross-provider
Fallbacks fire on 5xx, 429, and connection failures. Responses from a
fallback carry x-pai-fell-back-from: <primary-model> and increment the
pai_upstream_fallbacks_total{from_model,to_model} counter.
Limitations:
- Fallbacks are live on the OpenAI-compatible path (
/v1/chat/completions). The Anthropic-native path (/v1/messages) retries but does not fall back yet. - Streaming requests only fall back on pre-stream failures — once the first byte has reached the client, the gateway can't roll the response back.
Tracing + cost headers
Every successful response carries these headers so clients and sidecars can correlate calls without log access:
| Header | Meaning |
|---|---|
x-pai-call-id | Opaque 16-char call identifier |
x-pai-model-id | Model that actually served the request (post-fallback) |
x-pai-input-tokens / x-pai-output-tokens | Token counts from the upstream |
x-pai-cached-input-tokens | Prompt-cache hits (Anthropic / OpenAI) |
x-pai-cost-usd | USD cost, computed from the price table |
x-pai-duration-ms | Gateway-observed latency, including retries |
x-pai-retries | Number of retry attempts on the winning binding |
x-pai-fell-back-from | Set only if a fallback was used |
Cost is looked up from a built-in price table for the common Anthropic /
OpenAI / Gemini models; operators extend it via gateway.modelPrices in
the helm chart. When no price is known for a model, the header is omitted
but tokens still flow.