Model

A ModelProvider represents a single LLM API subscription -- one resource covers all models from that provider. Agents reference models as provider/model-id (e.g., anthropic/claude-sonnet-4-6), and Pai handles API key injection behind the scenes.

pai create model-provider <name> --provider <type> --api-key <key>
pai get model-providers
pai delete model-provider <name>

Quick create (CLI)

pai create model-provider anthropic \
  --provider anthropic \
  --api-key sk-ant-...

This single command stores your API key and gives your agents access to every Anthropic model -- Claude Sonnet, Haiku, Opus -- without creating a separate resource for each one.

YAML format

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
  name: anthropic
spec:
  provider: anthropic
  apiKeySecretRef:
    name: anthropic-key
    key: api-key
  maxTokensPerDay: 1000000

Apply with:

pai apply -f model-provider.yaml

Using models in an agent

Reference models as provider/model-id in your Agent:

spec:
  models:
    - anthropic/claude-sonnet-4-6
    - google/gemini-2.5-flash

The agent can use any model in the list. The first entry is the primary model. Multiple models let the agent fall back if one provider hits its token budget.

Supported providers

`provider`	Example models
`anthropic`	`claude-sonnet-4-6`, `claude-haiku-4-5`, `claude-opus-4-6`
`gemini`	`gemini-2.5-flash`, `gemini-2.0-flash`, `gemini-2.5-pro`
`openai`	`gpt-4o`, `gpt-4o-mini`, `o1`, `o3-mini`
`openrouter`	Any model on OpenRouter (use the full slug, e.g. `meta-llama/llama-3.1-70b-instruct`)
`azure-openai`	Your Azure deployment names — see Azure OpenAI
`vllm`	Any model served by your vLLM deployment — see OpenAI-compatible endpoints below

OpenAI-compatible endpoints (NVIDIA NIM, vLLM, Together AI, Fireworks, …)

Many LLM services expose an OpenAI-compatible API at a custom URL: NVIDIA NIM, self-hosted vLLM, Together AI, Fireworks, DeepInfra, Anyscale, Groq, and others. Use provider: openai with a custom endpoint to connect them.

NVIDIA NIM

NVIDIA NIM hosts open-weight models (Llama, GLM, Qwen, Mistral, etc.) behind an OpenAI-compatible endpoint at https://integrate.api.nvidia.com/v1.

# Store your API key
pai add secret nvidia-api-key --from-literal api-key=nvapi-...

# Create a ModelProvider that points at NIM
pai apply -f - <<'EOF'
apiVersion: pai.io/v1
kind: ModelProvider
metadata:
  name: nvidia
spec:
  provider: openai
  endpoint: https://integrate.api.nvidia.com/v1
  apiKeySecretRef:
    name: nvidia-api-key
    key: api-key
  allowedModels:
    - z-ai/glm4.7
    - meta/llama-3.1-70b-instruct
    - qwen/qwen3-next-80b-a3b-instruct
EOF

Agents then reference NIM models through the ModelProvider name:

spec:
  models:
    - nvidia/z-ai/glm4.7
    - nvidia/meta/llama-3.1-70b-instruct

Self-hosted vLLM

Point provider: vllm at your in-cluster vLLM service:

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
  name: vllm-local
spec:
  provider: vllm
  endpoint: http://vllm.vllm-system.svc.cluster.local:8000/v1
  apiKeySecretRef:          # vLLM can be token-gated or open
    name: vllm-token
    key: api-key
  allowedModels:
    - meta-llama/Meta-Llama-3-70B-Instruct

Other OpenAI-compatible services

The same pattern works for any service speaking OpenAI's API: Together AI, Fireworks, DeepInfra, Anyscale, Groq, OctoAI, Cloudflare Workers AI, and in-house gateways. Use provider: openai, set endpoint to the service's base URL, and list the models in allowedModels.

Why allowedModels matters here

For anthropic / gemini / openai / openrouter, Pai knows the catalogue and validates model names automatically. For custom endpoints Pai can't discover what's available, so allowedModels is the source of truth — add every model agents should be able to use.

Token budgets

maxTokensPerDay sets a hard daily cap across all models from this provider. When the limit is reached, requests return HTTP 429 until midnight UTC. If the agent has models from another provider in its list, it can fall back automatically.

maxTokensPerRequest limits the context window for a single request.

Check current consumption:

pai get metrics

Security Controls

Restrict which models can be used

Narrow which of the provider's models agents are allowed to call. Useful for blocking expensive models across a whole subscription.

spec:
  allowedModels:
    - claude-sonnet-4-6
    - claude-haiku-4-5
  deniedModels:
    - claude-opus-4-6      # block the most expensive model

deniedModels takes precedence over allowedModels. If both are empty, all models from the provider are available.

LLM guards

Guards scan every LLM request routed through this provider for prompt injection and jailbreak attempts. This is the org-wide baseline -- it applies to all agents and external gateway traffic using this provider.

spec:
  guards:
    - binding: prompt-guard-default
      scan:
        prompts: true
      enforcement: enforce

Guards set on a ModelProvider run before any per-agent guards. See Prompt Guard for details on configuring GuardBindings.

Exposing via the LLM Gateway

A ModelProvider can be exposed to developers outside Pai (laptops, CI). When externalAccess.enabled is set, the Pai Gateway accepts LLM requests from any machine authenticated with a valid AccessKey — no agent needed.

This turns Pai into a centralized LLM gateway for your team: developers route their local tools (Claude Code, SDKs, scripts) through Pai and get centralized API key management, per-user token budgets, guard scanning, and full audit logging — without ever seeing the real API key.

spec:
  externalAccess:
    enabled: true
    maxTokensPerDay: 2000000     # separate budget for external usage

Developer onboarding

Once a ModelProvider has external access enabled, developers connect in three commands:

# 1. Log in with an AccessKey from your admin
pai login https://api.pairun.dev --access-key pak_...

# 2. Configure local environment
eval $(pai gateway env)

# 3. Use Claude Code normally — requests route through Pai
claude

pai gateway env outputs the environment variables that redirect LLM traffic through the Pai Gateway. The AccessKey is wrapped in a format that Claude Code accepts as an API key — the real Anthropic key never leaves Pai.

External access controls

Field	Description
`externalAccess.enabled`	Must be `true` for external requests to be accepted
`externalAccess.maxTokensPerDay`	Separate daily budget for external usage. If unset, shares the main budget.
`guards`	Guard scanning applies to external requests too — this is the only protection layer for external traffic

For the full setup guide, see LLM Gateway.

Field reference

Field	Required	Description
`provider`	Yes	LLM provider type (`anthropic`, `gemini`, `openai`, `openrouter`, `azure-openai`, `vllm`)
`apiKeySecretRef`	Yes	`{name, key}` reference to a Secret holding the API key
`endpoint`	No	Custom API endpoint URL. Required for `azure-openai` and `vllm`.
`allowedModels`	No	Model IDs agents may use. Empty = all allowed.
`deniedModels`	No	Model IDs to block. Takes precedence over `allowedModels`.
`maxTokensPerDay`	No	Hard daily token budget across all usage of this provider
`maxTokensPerRequest`	No	Maximum tokens per single request
`guards`	No	Provider-wide LLM guards. Applied to all traffic -- agents and external. See Prompt Guard.
`externalAccess.enabled`	No	Expose this provider to developers outside Pai (laptops, CI). See LLM Gateway.
`externalAccess.maxTokensPerDay`	No	Separate daily budget for external usage

Quick create (CLI)​

YAML format​

Using models in an agent​

Supported providers​

OpenAI-compatible endpoints (NVIDIA NIM, vLLM, Together AI, Fireworks, …)​

NVIDIA NIM​

Self-hosted vLLM​

Other OpenAI-compatible services​

Token budgets​

Security Controls​

Restrict which models can be used​

LLM guards​

Exposing via the LLM Gateway​

Developer onboarding​

External access controls​

Field reference​