Model
A ModelProvider represents a single LLM API subscription -- one resource covers all models from that provider. Agents reference models as provider/model-id (e.g., anthropic/claude-sonnet-4-6), and Pai handles API key injection behind the scenes.
pai create model-provider <name> --provider <type> --api-key <key>
pai get model-providers
pai delete model-provider <name>
Quick create (CLI)
pai create model-provider anthropic \
--provider anthropic \
--api-key sk-ant-...
This single command stores your API key and gives your agents access to every Anthropic model -- Claude Sonnet, Haiku, Opus -- without creating a separate resource for each one.
YAML format
apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: anthropic
spec:
provider: anthropic
apiKeySecretRef:
name: anthropic-key
key: api-key
maxTokensPerDay: 1000000
Apply with:
pai apply -f model-provider.yaml
Using models in an agent
Reference models as provider/model-id in your Agent:
spec:
models:
- anthropic/claude-sonnet-4-6
- google/gemini-2.5-flash
The agent can use any model in the list. The first entry is the primary model. Multiple models let the agent fall back if one provider hits its token budget.
Supported providers
provider | Example models |
|---|---|
anthropic | claude-sonnet-4-6, claude-haiku-4-5, claude-opus-4-6 |
gemini | gemini-2.5-flash, gemini-2.0-flash, gemini-2.5-pro |
openai | gpt-4o, gpt-4o-mini, o1, o3-mini |
openrouter | Any model on OpenRouter (use the full slug, e.g. meta-llama/llama-3.1-70b-instruct) |
azure-openai | Your Azure deployment names — see Azure OpenAI |
vllm | Any model served by your vLLM deployment — see OpenAI-compatible endpoints below |
OpenAI-compatible endpoints (NVIDIA NIM, vLLM, Together AI, Fireworks, …)
Many LLM services expose an OpenAI-compatible API at a custom URL: NVIDIA NIM, self-hosted vLLM, Together AI, Fireworks, DeepInfra, Anyscale, Groq, and others. Use provider: openai with a custom endpoint to connect them.
NVIDIA NIM
NVIDIA NIM hosts open-weight models (Llama, GLM, Qwen, Mistral, etc.) behind an OpenAI-compatible endpoint at https://integrate.api.nvidia.com/v1.
# Store your API key
pai add secret nvidia-api-key --from-literal api-key=nvapi-...
# Create a ModelProvider that points at NIM
pai apply -f - <<'EOF'
apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: nvidia
spec:
provider: openai
endpoint: https://integrate.api.nvidia.com/v1
apiKeySecretRef:
name: nvidia-api-key
key: api-key
allowedModels:
- z-ai/glm4.7
- meta/llama-3.1-70b-instruct
- qwen/qwen3-next-80b-a3b-instruct
EOF
Agents then reference NIM models through the ModelProvider name:
spec:
models:
- nvidia/z-ai/glm4.7
- nvidia/meta/llama-3.1-70b-instruct
Self-hosted vLLM
Point provider: vllm at your in-cluster vLLM service:
apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: vllm-local
spec:
provider: vllm
endpoint: http://vllm.vllm-system.svc.cluster.local:8000/v1
apiKeySecretRef: # vLLM can be token-gated or open
name: vllm-token
key: api-key
allowedModels:
- meta-llama/Meta-Llama-3-70B-Instruct
Other OpenAI-compatible services
The same pattern works for any service speaking OpenAI's API: Together AI, Fireworks, DeepInfra, Anyscale, Groq, OctoAI, Cloudflare Workers AI, and in-house gateways. Use provider: openai, set endpoint to the service's base URL, and list the models in allowedModels.
allowedModels matters hereFor anthropic / gemini / openai / openrouter, Pai knows the catalogue and validates model names automatically. For custom endpoints Pai can't discover what's available, so allowedModels is the source of truth — add every model agents should be able to use.
Token budgets
maxTokensPerDay sets a hard daily cap across all models from this provider. When the limit is reached, requests return HTTP 429 until midnight UTC. If the agent has models from another provider in its list, it can fall back automatically.
maxTokensPerRequest limits the context window for a single request.
Check current consumption:
pai get metrics
Security Controls
Restrict which models can be used
Narrow which of the provider's models agents are allowed to call. Useful for blocking expensive models across a whole subscription.
spec:
allowedModels:
- claude-sonnet-4-6
- claude-haiku-4-5
deniedModels:
- claude-opus-4-6 # block the most expensive model
deniedModels takes precedence over allowedModels. If both are empty, all models from the provider are available.
LLM guards
Guards scan every LLM request routed through this provider for prompt injection and jailbreak attempts. This is the org-wide baseline -- it applies to all agents and external gateway traffic using this provider.
spec:
guards:
- binding: prompt-guard-default
scan:
prompts: true
enforcement: enforce
Guards set on a ModelProvider run before any per-agent guards. See Prompt Guard for details on configuring GuardBindings.
Exposing via the LLM Gateway
A ModelProvider can be exposed to developers outside Pai (laptops, CI). When externalAccess.enabled is set, the Pai Gateway accepts LLM requests from any machine authenticated with a valid AccessKey — no agent needed.
This turns Pai into a centralized LLM gateway for your team: developers route their local tools (Claude Code, SDKs, scripts) through Pai and get centralized API key management, per-user token budgets, guard scanning, and full audit logging — without ever seeing the real API key.
spec:
externalAccess:
enabled: true
maxTokensPerDay: 2000000 # separate budget for external usage
Developer onboarding
Once a ModelProvider has external access enabled, developers connect in three commands:
# 1. Log in with an AccessKey from your admin
pai login https://api.pairun.dev --access-key pak_...
# 2. Configure local environment
eval $(pai gateway env)
# 3. Use Claude Code normally — requests route through Pai
claude
pai gateway env outputs the environment variables that redirect LLM traffic through the Pai Gateway. The AccessKey is wrapped in a format that Claude Code accepts as an API key — the real Anthropic key never leaves Pai.
External access controls
| Field | Description |
|---|---|
externalAccess.enabled | Must be true for external requests to be accepted |
externalAccess.maxTokensPerDay | Separate daily budget for external usage. If unset, shares the main budget. |
guards | Guard scanning applies to external requests too — this is the only protection layer for external traffic |
For the full setup guide, see LLM Gateway.
Field reference
| Field | Required | Description |
|---|---|---|
provider | Yes | LLM provider type (anthropic, gemini, openai, openrouter, azure-openai, vllm) |
apiKeySecretRef | Yes | {name, key} reference to a Secret holding the API key |
endpoint | No | Custom API endpoint URL. Required for azure-openai and vllm. |
allowedModels | No | Model IDs agents may use. Empty = all allowed. |
deniedModels | No | Model IDs to block. Takes precedence over allowedModels. |
maxTokensPerDay | No | Hard daily token budget across all usage of this provider |
maxTokensPerRequest | No | Maximum tokens per single request |
guards | No | Provider-wide LLM guards. Applied to all traffic -- agents and external. See Prompt Guard. |
externalAccess.enabled | No | Expose this provider to developers outside Pai (laptops, CI). See LLM Gateway. |
externalAccess.maxTokensPerDay | No | Separate daily budget for external usage |