Model
A ModelBinding connects an LLM provider and model to your Pai workspace. Agents reference models by name — the real API key is managed by Pai and never exposed to the agent.
pai add model <name> --provider <p> --api-key <secret>
pai get models
pai delete model <name>
Quick add (CLI)
pai add model gemini-flash \
--provider gemini \
--api-key gemini-key \
--max-tokens-day 1000000 \
--max-tokens-request 32000
YAML format
apiVersion: pai.io/v1
kind: ModelBinding
metadata:
name: gemini-flash
spec:
provider: gemini
model: gemini-2.0-flash
maxTokensPerDay: 1000000
maxTokensPerRequest: 32000
apiKeySecretRef:
name: gemini-key
key: api-key
Apply with:
pai apply -f model.yaml
Field reference
| Field | Required | Description |
|---|---|---|
provider | Yes | LLM provider (see table below) |
model | Yes | Model identifier string |
maxTokensPerDay | No | Hard daily token budget across all requests to this model |
maxTokensPerRequest | No | Maximum tokens allowed per single request |
apiKeySecretRef.name | No | Name of the Secret containing the API key |
apiKeySecretRef.key | No | Key within the Secret (default: token) |
Supported providers
provider | Models |
|---|---|
anthropic | claude-sonnet-4-6, claude-haiku-4-5, claude-opus-4-6 |
gemini | gemini-2.5-flash, gemini-2.0-flash, gemini-2.5-pro |
openai | gpt-4o, gpt-4o-mini, o1, o3-mini |
openrouter | Any model on OpenRouter (use the full slug, e.g. meta-llama/llama-3.1-70b-instruct) |
Examples
Anthropic Claude:
apiVersion: pai.io/v1
kind: ModelBinding
metadata:
name: claude-sonnet
spec:
provider: anthropic
model: claude-sonnet-4-6
maxTokensPerDay: 500000
maxTokensPerRequest: 16000
apiKeySecretRef:
name: anthropic-key
key: api-key
OpenRouter (access many models with one key):
apiVersion: pai.io/v1
kind: ModelBinding
metadata:
name: llama-70b
spec:
provider: openrouter
model: meta-llama/llama-3.1-70b-instruct
maxTokensPerDay: 2000000
apiKeySecretRef:
name: openrouter-key
key: api-key
Using a model in an agent
Reference the model by name in your AgentWorkload:
spec:
modelBindings:
- gemini-flash
- claude-sonnet
The agent can use any model in the list. Multiple models let the agent choose based on task complexity.
Token budgets
When a model hits its maxTokensPerDay limit, requests to that model return an error until midnight UTC. The agent can fall back to another model in its modelBindings list if one is available.
Check current consumption:
pai status my-agent
# Models:
# - gemini-flash (gemini / gemini-2.0-flash) — 45,230 / 1,000,000 tokens today