ModelProvider

A ModelProvider represents a single LLM API subscription (Anthropic, OpenAI, Google, etc.). One resource covers all models from that provider — no need to create a separate resource per model.

Agents reference models as provider/model-id:

spec:
  models:
    - anthropic/claude-sonnet-4-6
    - google/gemini-2.5-flash

Quick start

# Create a ModelProvider (the API key is stored securely for you)
pai create model-provider anthropic --provider anthropic --api-key sk-ant-...

# Verify
pai get model-providers
# NAME        PROVIDER    MAX/DAY  READY  AGE
# anthropic   anthropic            true   5s

Full spec

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
  name: anthropic
spec:
  provider: anthropic              # anthropic | openai | gemini | openrouter | azure-openai | vllm
  apiKeySecretRef:
    name: anthropic-key
    key: api-key
  endpoint: ""                     # custom URL (required for azure-openai, vllm)
  apiVersion: ""                   # API version query param (required for azure-openai)
  allowedModels: []                # empty = all models allowed
  deniedModels: []                 # takes precedence over allowedModels
  maxTokensPerDay: 5000000         # provider-wide budget across all agents
  maxTokensPerRequest: 200000      # per-request context window limit
  retry:                           # transient-error retry policy (default 3 attempts)
    maxAttempts: 3
    initialBackoffMs: 200
    maxBackoffMs: 5000
  fallbacks:                       # tried after retries exhaust
    - anthropic/claude-haiku-4-5
  guards:                          # org-wide baseline — applies to ALL traffic
    - binding: prompt-guard-default
      scan:
        prompts: true
      enforcement: enforce
  externalAccess:
    enabled: false                 # expose to developers outside Pai (laptops, CI)
    maxTokensPerDay: 2000000       # separate budget for external usage

Fields

Field	Type	Required	Description
`provider`	string	Yes	LLM provider backend: `anthropic`, `openai`, `gemini`, `openrouter`, `azure-openai`, `vllm`
`apiKeySecretRef`	object	Yes	`{name, key}` — reference to a Secret holding the API key
`endpoint`	string	No	Custom API endpoint URL (required for azure-openai and vllm). For azure-openai, the resource URL (e.g. `https://myresource.openai.azure.com`).
`apiVersion`	string	No	API version query parameter. Required for `azure-openai` (e.g. `2024-06-01`). Ignored by other providers.
`allowedModels`	string[]	No	Model IDs allowed. Empty = all allowed. For azure-openai, these are deployment names.
`deniedModels`	string[]	No	Model IDs denied. Takes precedence over allowedModels.
`maxTokensPerDay`	integer	No	Hard daily budget across ALL agents using this provider
`maxTokensPerRequest`	integer	No	Max tokens per single request
`retry.maxAttempts`	integer	No	Total attempts including the first try. Default `3`, clamped to `[1,10]`.
`retry.initialBackoffMs`	integer	No	Backoff before the second attempt. Doubles each retry (with ±25% jitter). Default `200`.
`retry.maxBackoffMs`	integer	No	Upper cap on the backoff. Default `5000`.
`fallbacks`	string[]	No	Ordered `provider/model-id` references tried when retries on the primary exhaust.
`guards`	object[]	No	Provider-wide LLM guards (prompt injection / jailbreak scanning)
`guards[].binding`	string	Yes	GuardBinding name
`guards[].scan`	object	No	What to scan: `{prompts, responses, toolResults}`
`guards[].enforcement`	string	No	Override: `enforce` or `audit`
`externalAccess.enabled`	boolean	No	Expose to external developers via the LLM Gateway
`externalAccess.maxTokensPerDay`	integer	No	Separate daily budget for external usage

Status

Field	Description
`status.ready`	`true` when the referenced Secret exists and spec is valid
`status.message`	Error details when ready is false
`status.tokensToday`	Total tokens consumed today across all agents
`status.externalUrl`	External proxy URL (when externalAccess is enabled)

Two-layer guard model

Guards can be set at two levels:

Layer	Scope	Use case
ModelProvider guards	All traffic through this provider	Org-wide baseline — "no prompt injection ever reaches Anthropic"
Agent guards	Per-agent	Tighter rules (e.g., also scan tool results for a specific agent)

Both layers run on every request. ModelProvider guards execute first. For external gateway requests (/ext/), ModelProvider guards are the only layer since there is no agent.

spec:
  guards:
    - binding: prompt-guard-default
      scan:
        prompts: true
      enforcement: enforce   # block prompt injection at the provider level

Azure OpenAI

Azure OpenAI requires two extra fields: endpoint (the resource URL) and apiVersion. Model IDs are your Azure deployment names, not the upstream model name.

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
  name: azure
spec:
  provider: azure-openai
  endpoint: https://myresource.openai.azure.com
  apiVersion: "2024-06-01"
  apiKeySecretRef:
    name: azure-openai-key
    key: api-key
  allowedModels: [gpt-4o-deployment, gpt-4o-mini-deployment]

Agents reference deployments through the provider prefix:

spec:
  models:
    - azure/gpt-4o-deployment

The gateway rewrites the request to {endpoint}/openai/deployments/{deployment}/chat/completions?api-version={apiVersion} and injects the api-key header automatically.

Reliability: retries + fallbacks

The gateway retries transient failures and falls back to alternate models when the primary provider is exhausted.

Retries

Every upstream call is wrapped in a retry loop that reacts to 408, 425, 429, 500, 502, 503, 504, and to connection / read / timeout errors. Backoff is exponential (initialBackoffMs × 2^attempt, capped at maxBackoffMs) with ±25% jitter.

spec:
  retry:
    maxAttempts: 5                 # try up to 5 times total
    initialBackoffMs: 500          # 500ms, then 1000ms, then 2000ms...
    maxBackoffMs: 10000            # ...capped at 10s

Defaults are maxAttempts: 3, initialBackoffMs: 200, maxBackoffMs: 5000. Set maxAttempts: 1 to disable retries entirely.

Fallbacks

When retries on the primary model exhaust, the gateway walks the fallbacks list in order. Each entry is a full provider/model-id reference that must resolve to another ModelProvider (cross-provider fallback is allowed — Anthropic → OpenAI → Gemini is a valid chain).

spec:
  fallbacks:
    - anthropic/claude-haiku-4-5   # cheaper sibling first
    - openai/gpt-4o                # then cross-provider

Fallbacks fire on 5xx, 429, and connection failures. Responses from a fallback carry x-pai-fell-back-from: <primary-model> and increment the pai_upstream_fallbacks_total{from_model,to_model} counter.

Limitations:

Fallbacks are live on the OpenAI-compatible path (/v1/chat/completions). The Anthropic-native path (/v1/messages) retries but does not fall back yet.
Streaming requests only fall back on pre-stream failures — once the first byte has reached the client, the gateway can't roll the response back.

Tracing + cost headers

Every successful response carries these headers so clients and sidecars can correlate calls without log access:

Header	Meaning
`x-pai-call-id`	Opaque 16-char call identifier
`x-pai-model-id`	Model that actually served the request (post-fallback)
`x-pai-input-tokens` / `x-pai-output-tokens`	Token counts from the upstream
`x-pai-cached-input-tokens`	Prompt-cache hits (Anthropic / OpenAI)
`x-pai-cost-usd`	USD cost, computed from the price table
`x-pai-duration-ms`	Gateway-observed latency, including retries
`x-pai-retries`	Number of retry attempts on the winning binding
`x-pai-fell-back-from`	Set only if a fallback was used

Cost is looked up from a built-in price table for the common Anthropic / OpenAI / Gemini models; operators extend it via gateway.modelPrices in the helm chart. When no price is known for a model, the header is omitted but tokens still flow.

Quick start​

Full spec​

Fields​

Status​

Two-layer guard model​

Azure OpenAI​

Reliability: retries + fallbacks​

Retries​

Fallbacks​

Tracing + cost headers​