Skip to main content

Model

A ModelBinding connects an LLM provider and model to your Pai workspace. Agents reference models by name — the real API key is managed by Pai and never exposed to the agent.

pai add model <name> --provider <p> --api-key <secret>
pai get models
pai delete model <name>

Quick add (CLI)

pai add model gemini-flash \
--provider gemini \
--api-key gemini-key \
--max-tokens-day 1000000 \
--max-tokens-request 32000

YAML format

apiVersion: pai.io/v1
kind: ModelBinding
metadata:
name: gemini-flash
spec:
provider: gemini
model: gemini-2.0-flash
maxTokensPerDay: 1000000
maxTokensPerRequest: 32000
apiKeySecretRef:
name: gemini-key
key: api-key

Apply with:

pai apply -f model.yaml

Field reference

FieldRequiredDescription
providerYesLLM provider (see table below)
modelYesModel identifier string
maxTokensPerDayNoHard daily token budget across all requests to this model
maxTokensPerRequestNoMaximum tokens allowed per single request
apiKeySecretRef.nameNoName of the Secret containing the API key
apiKeySecretRef.keyNoKey within the Secret (default: token)

Supported providers

providerModels
anthropicclaude-sonnet-4-6, claude-haiku-4-5, claude-opus-4-6
geminigemini-2.5-flash, gemini-2.0-flash, gemini-2.5-pro
openaigpt-4o, gpt-4o-mini, o1, o3-mini
openrouterAny model on OpenRouter (use the full slug, e.g. meta-llama/llama-3.1-70b-instruct)

Examples

Anthropic Claude:

apiVersion: pai.io/v1
kind: ModelBinding
metadata:
name: claude-sonnet
spec:
provider: anthropic
model: claude-sonnet-4-6
maxTokensPerDay: 500000
maxTokensPerRequest: 16000
apiKeySecretRef:
name: anthropic-key
key: api-key

OpenRouter (access many models with one key):

apiVersion: pai.io/v1
kind: ModelBinding
metadata:
name: llama-70b
spec:
provider: openrouter
model: meta-llama/llama-3.1-70b-instruct
maxTokensPerDay: 2000000
apiKeySecretRef:
name: openrouter-key
key: api-key

Using a model in an agent

Reference the model by name in your AgentWorkload:

spec:
modelBindings:
- gemini-flash
- claude-sonnet

The agent can use any model in the list. Multiple models let the agent choose based on task complexity.

Token budgets

When a model hits its maxTokensPerDay limit, requests to that model return an error until midnight UTC. The agent can fall back to another model in its modelBindings list if one is available.

Check current consumption:

pai status my-agent
# Models:
# - gemini-flash (gemini / gemini-2.0-flash) — 45,230 / 1,000,000 tokens today