Skip to main content

Model

A ModelProvider represents a single LLM API subscription -- one resource covers all models from that provider. Agents reference models as provider/model-id (e.g., anthropic/claude-sonnet-4-6), and Pai handles API key injection behind the scenes.

pai create model-provider <name> --provider <type> --api-key <key>
pai get model-providers
pai delete model-provider <name>

Quick create (CLI)

pai create model-provider anthropic \
--provider anthropic \
--api-key sk-ant-...

This single command stores your API key and gives your agents access to every Anthropic model -- Claude Sonnet, Haiku, Opus -- without creating a separate resource for each one.

YAML format

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: anthropic
spec:
provider: anthropic
apiKeySecretRef:
name: anthropic-key
key: api-key
maxTokensPerDay: 1000000

Apply with:

pai apply -f model-provider.yaml

Using models in an agent

Reference models as provider/model-id in your Agent:

spec:
models:
- anthropic/claude-sonnet-4-6
- google/gemini-2.5-flash

The agent can use any model in the list. The first entry is the primary model. Multiple models let the agent fall back if one provider hits its token budget.

Supported providers

providerExample models
anthropicclaude-sonnet-4-6, claude-haiku-4-5, claude-opus-4-6
geminigemini-2.5-flash, gemini-2.0-flash, gemini-2.5-pro
openaigpt-4o, gpt-4o-mini, o1, o3-mini
openrouterAny model on OpenRouter (use the full slug, e.g. meta-llama/llama-3.1-70b-instruct)
azure-openaiYour Azure deployment names — see Azure OpenAI
vllmAny model served by your vLLM deployment — see OpenAI-compatible endpoints below

OpenAI-compatible endpoints (NVIDIA NIM, vLLM, Together AI, Fireworks, …)

Many LLM services expose an OpenAI-compatible API at a custom URL: NVIDIA NIM, self-hosted vLLM, Together AI, Fireworks, DeepInfra, Anyscale, Groq, and others. Use provider: openai with a custom endpoint to connect them.

NVIDIA NIM

NVIDIA NIM hosts open-weight models (Llama, GLM, Qwen, Mistral, etc.) behind an OpenAI-compatible endpoint at https://integrate.api.nvidia.com/v1.

# Store your API key
pai add secret nvidia-api-key --from-literal api-key=nvapi-...

# Create a ModelProvider that points at NIM
pai apply -f - <<'EOF'
apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: nvidia
spec:
provider: openai
endpoint: https://integrate.api.nvidia.com/v1
apiKeySecretRef:
name: nvidia-api-key
key: api-key
allowedModels:
- z-ai/glm4.7
- meta/llama-3.1-70b-instruct
- qwen/qwen3-next-80b-a3b-instruct
EOF

Agents then reference NIM models through the ModelProvider name:

spec:
models:
- nvidia/z-ai/glm4.7
- nvidia/meta/llama-3.1-70b-instruct

Self-hosted vLLM

Point provider: vllm at your in-cluster vLLM service:

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
name: vllm-local
spec:
provider: vllm
endpoint: http://vllm.vllm-system.svc.cluster.local:8000/v1
apiKeySecretRef: # vLLM can be token-gated or open
name: vllm-token
key: api-key
allowedModels:
- meta-llama/Meta-Llama-3-70B-Instruct

Other OpenAI-compatible services

The same pattern works for any service speaking OpenAI's API: Together AI, Fireworks, DeepInfra, Anyscale, Groq, OctoAI, Cloudflare Workers AI, and in-house gateways. Use provider: openai, set endpoint to the service's base URL, and list the models in allowedModels.

Why allowedModels matters here

For anthropic / gemini / openai / openrouter, Pai knows the catalogue and validates model names automatically. For custom endpoints Pai can't discover what's available, so allowedModels is the source of truth — add every model agents should be able to use.

Token budgets

maxTokensPerDay sets a hard daily cap across all models from this provider. When the limit is reached, requests return HTTP 429 until midnight UTC. If the agent has models from another provider in its list, it can fall back automatically.

maxTokensPerRequest limits the context window for a single request.

Check current consumption:

pai get metrics

Security Controls

Restrict which models can be used

Narrow which of the provider's models agents are allowed to call. Useful for blocking expensive models across a whole subscription.

spec:
allowedModels:
- claude-sonnet-4-6
- claude-haiku-4-5
deniedModels:
- claude-opus-4-6 # block the most expensive model

deniedModels takes precedence over allowedModels. If both are empty, all models from the provider are available.

LLM guards

Guards scan every LLM request routed through this provider for prompt injection and jailbreak attempts. This is the org-wide baseline -- it applies to all agents and external gateway traffic using this provider.

spec:
guards:
- binding: prompt-guard-default
scan:
prompts: true
enforcement: enforce

Guards set on a ModelProvider run before any per-agent guards. See Prompt Guard for details on configuring GuardBindings.

Exposing via the LLM Gateway

A ModelProvider can be exposed to developers outside Pai (laptops, CI). When externalAccess.enabled is set, the Pai Gateway accepts LLM requests from any machine authenticated with a valid AccessKey — no agent needed.

This turns Pai into a centralized LLM gateway for your team: developers route their local tools (Claude Code, SDKs, scripts) through Pai and get centralized API key management, per-user token budgets, guard scanning, and full audit logging — without ever seeing the real API key.

spec:
externalAccess:
enabled: true
maxTokensPerDay: 2000000 # separate budget for external usage

Developer onboarding

Once a ModelProvider has external access enabled, developers connect in three commands:

# 1. Log in with an AccessKey from your admin
pai login https://api.pairun.dev --access-key pak_...

# 2. Configure local environment
eval $(pai gateway env)

# 3. Use Claude Code normally — requests route through Pai
claude

pai gateway env outputs the environment variables that redirect LLM traffic through the Pai Gateway. The AccessKey is wrapped in a format that Claude Code accepts as an API key — the real Anthropic key never leaves Pai.

External access controls

FieldDescription
externalAccess.enabledMust be true for external requests to be accepted
externalAccess.maxTokensPerDaySeparate daily budget for external usage. If unset, shares the main budget.
guardsGuard scanning applies to external requests too — this is the only protection layer for external traffic

For the full setup guide, see LLM Gateway.

Field reference

FieldRequiredDescription
providerYesLLM provider type (anthropic, gemini, openai, openrouter, azure-openai, vllm)
apiKeySecretRefYes{name, key} reference to a Secret holding the API key
endpointNoCustom API endpoint URL. Required for azure-openai and vllm.
allowedModelsNoModel IDs agents may use. Empty = all allowed.
deniedModelsNoModel IDs to block. Takes precedence over allowedModels.
maxTokensPerDayNoHard daily token budget across all usage of this provider
maxTokensPerRequestNoMaximum tokens per single request
guardsNoProvider-wide LLM guards. Applied to all traffic -- agents and external. See Prompt Guard.
externalAccess.enabledNoExpose this provider to developers outside Pai (laptops, CI). See LLM Gateway.
externalAccess.maxTokensPerDayNoSeparate daily budget for external usage