Skip to main content

ModelProvider

A ModelProvider represents a single LLM API subscription (Anthropic, OpenAI, Google, etc.). One resource covers all models from that provider — no need to create a separate resource per model.

Agents reference models as provider/model-id:

spec:
models:
- anthropic/claude-sonnet-4-6
- google/gemini-2.5-flash

Quick start

# Create a ModelProvider (the API key is stored securely for you)
pai create model-provider anthropic --provider anthropic --api-key sk-ant-...

# Verify
pai get model-providers
# NAME PROVIDER MAX/DAY READY AGE
# anthropic anthropic true 5s

Full spec

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: anthropic
spec:
provider: anthropic # anthropic | openai | gemini | openrouter | azure-openai | vllm
apiKeySecretRef:
name: anthropic-key
key: api-key
endpoint: "" # custom URL (required for azure-openai, vllm)
apiVersion: "" # API version query param (required for azure-openai)
allowedModels: [] # empty = all models allowed
deniedModels: [] # takes precedence over allowedModels
maxTokensPerDay: 5000000 # provider-wide budget across all agents
maxTokensPerRequest: 200000 # per-request context window limit
retry: # transient-error retry policy (default 3 attempts)
maxAttempts: 3
initialBackoffMs: 200
maxBackoffMs: 5000
fallbacks: # tried after retries exhaust
- anthropic/claude-haiku-4-5
guards: # org-wide baseline — applies to ALL traffic
- binding: prompt-guard-default
scan:
prompts: true
enforcement: enforce
externalAccess:
enabled: false # expose to developers outside Pai (laptops, CI)
maxTokensPerDay: 2000000 # separate budget for external usage

Fields

FieldTypeRequiredDescription
providerstringYesLLM provider backend: anthropic, openai, gemini, openrouter, azure-openai, vllm
apiKeySecretRefobjectYes{name, key} — reference to a Secret holding the API key
endpointstringNoCustom API endpoint URL (required for azure-openai and vllm). For azure-openai, the resource URL (e.g. https://myresource.openai.azure.com).
apiVersionstringNoAPI version query parameter. Required for azure-openai (e.g. 2024-06-01). Ignored by other providers.
allowedModelsstring[]NoModel IDs allowed. Empty = all allowed. For azure-openai, these are deployment names.
deniedModelsstring[]NoModel IDs denied. Takes precedence over allowedModels.
maxTokensPerDayintegerNoHard daily budget across ALL agents using this provider
maxTokensPerRequestintegerNoMax tokens per single request
retry.maxAttemptsintegerNoTotal attempts including the first try. Default 3, clamped to [1,10].
retry.initialBackoffMsintegerNoBackoff before the second attempt. Doubles each retry (with ±25% jitter). Default 200.
retry.maxBackoffMsintegerNoUpper cap on the backoff. Default 5000.
fallbacksstring[]NoOrdered provider/model-id references tried when retries on the primary exhaust.
guardsobject[]NoProvider-wide LLM guards (prompt injection / jailbreak scanning)
guards[].bindingstringYesGuardBinding name
guards[].scanobjectNoWhat to scan: {prompts, responses, toolResults}
guards[].enforcementstringNoOverride: enforce or audit
externalAccess.enabledbooleanNoExpose to external developers via the LLM Gateway
externalAccess.maxTokensPerDayintegerNoSeparate daily budget for external usage

Status

FieldDescription
status.readytrue when the referenced Secret exists and spec is valid
status.messageError details when ready is false
status.tokensTodayTotal tokens consumed today across all agents
status.externalUrlExternal proxy URL (when externalAccess is enabled)

Two-layer guard model

Guards can be set at two levels:

LayerScopeUse case
ModelProvider guardsAll traffic through this providerOrg-wide baseline — "no prompt injection ever reaches Anthropic"
Agent guardsPer-agentTighter rules (e.g., also scan tool results for a specific agent)

Both layers run on every request. ModelProvider guards execute first. For external gateway requests (/ext/), ModelProvider guards are the only layer since there is no agent.

spec:
guards:
- binding: prompt-guard-default
scan:
prompts: true
enforcement: enforce # block prompt injection at the provider level

Azure OpenAI

Azure OpenAI requires two extra fields: endpoint (the resource URL) and apiVersion. Model IDs are your Azure deployment names, not the upstream model name.

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: azure
spec:
provider: azure-openai
endpoint: https://myresource.openai.azure.com
apiVersion: "2024-06-01"
apiKeySecretRef:
name: azure-openai-key
key: api-key
allowedModels: [gpt-4o-deployment, gpt-4o-mini-deployment]

Agents reference deployments through the provider prefix:

spec:
models:
- azure/gpt-4o-deployment

The gateway rewrites the request to {endpoint}/openai/deployments/{deployment}/chat/completions?api-version={apiVersion} and injects the api-key header automatically.

Reliability: retries + fallbacks

The gateway retries transient failures and falls back to alternate models when the primary provider is exhausted.

Retries

Every upstream call is wrapped in a retry loop that reacts to 408, 425, 429, 500, 502, 503, 504, and to connection / read / timeout errors. Backoff is exponential (initialBackoffMs × 2^attempt, capped at maxBackoffMs) with ±25% jitter.

spec:
retry:
maxAttempts: 5 # try up to 5 times total
initialBackoffMs: 500 # 500ms, then 1000ms, then 2000ms...
maxBackoffMs: 10000 # ...capped at 10s

Defaults are maxAttempts: 3, initialBackoffMs: 200, maxBackoffMs: 5000. Set maxAttempts: 1 to disable retries entirely.

Fallbacks

When retries on the primary model exhaust, the gateway walks the fallbacks list in order. Each entry is a full provider/model-id reference that must resolve to another ModelProvider (cross-provider fallback is allowed — Anthropic → OpenAI → Gemini is a valid chain).

spec:
fallbacks:
- anthropic/claude-haiku-4-5 # cheaper sibling first
- openai/gpt-4o # then cross-provider

Fallbacks fire on 5xx, 429, and connection failures. Responses from a fallback carry x-pai-fell-back-from: <primary-model> and increment the pai_upstream_fallbacks_total{from_model,to_model} counter.

Limitations:

  • Fallbacks are live on the OpenAI-compatible path (/v1/chat/completions). The Anthropic-native path (/v1/messages) retries but does not fall back yet.
  • Streaming requests only fall back on pre-stream failures — once the first byte has reached the client, the gateway can't roll the response back.

Tracing + cost headers

Every successful response carries these headers so clients and sidecars can correlate calls without log access:

HeaderMeaning
x-pai-call-idOpaque 16-char call identifier
x-pai-model-idModel that actually served the request (post-fallback)
x-pai-input-tokens / x-pai-output-tokensToken counts from the upstream
x-pai-cached-input-tokensPrompt-cache hits (Anthropic / OpenAI)
x-pai-cost-usdUSD cost, computed from the price table
x-pai-duration-msGateway-observed latency, including retries
x-pai-retriesNumber of retry attempts on the winning binding
x-pai-fell-back-fromSet only if a fallback was used

Cost is looked up from a built-in price table for the common Anthropic / OpenAI / Gemini models; operators extend it via gateway.modelPrices in the helm chart. When no price is known for a model, the header is omitted but tokens still flow.