Agent
An AgentWorkload is the core resource in Pai. It declares an AI agent: what container to run, which models it can use, which external services it can access, and how to expose it.
pai create -f agent.yaml
pai apply -f agent.yaml # create or update
pai delete agent <name>
pai list # list all agents
pai status <name> # detailed status
Minimal example
apiVersion: pai.io/v1
kind: AgentWorkload
metadata:
name: my-agent
spec:
image: ghcr.io/pai-platform/openclaw:latest
modelBindings:
- gemini-flash
Full example
apiVersion: pai.io/v1
kind: AgentWorkload
metadata:
name: dev-assistant
spec:
image: ghcr.io/pai-platform/openclaw:latest
runAsUser: 1000
modelBindings:
- gemini-flash
- claude-sonnet
providers:
- github-writer
- telegram-bot
inbound:
port: 3000
allowCIDRs:
- "0.0.0.0/0"
volumes:
- name: workspace
mountPath: /home/node/workspace
size: "5Gi"
configFiles:
- path: /home/node/.openclaw/openclaw.json
content: |
{
"channels": ["web", "telegram"],
"defaultModel": "gemini-flash"
}
env:
- name: LOG_LEVEL
value: "info"
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
Field reference
Core
| Field | Type | Default | Description |
|---|---|---|---|
image | string | — | Container image to run |
replicas | integer | 1 | Number of instances |
runAsUser | integer | 65532 | UID to run the container as. Cannot be 0 (root is rejected) |
command | string[] | — | Override the container entrypoint + command |
Models
| Field | Type | Description |
|---|---|---|
modelBindings | string[] | Names of Model resources the agent can use |
The agent's LLM calls are routed through the Pai gateway, which injects the real API key. The agent never sees credentials.
Providers
| Field | Type | Description |
|---|---|---|
providers | string[] | Names of Provider resources to attach |
Each provider gives the agent access to an external service (GitHub, Telegram, AWS, etc.) through the sidecar proxy.
Inbound traffic
Expose a port for incoming requests (webhooks, web UI, APIs):
| Field | Type | Default | Description |
|---|---|---|---|
inbound.port | integer | — | Container port the agent listens on |
inbound.allowCIDRs | string[] | [] (deny all) | CIDR blocks allowed to reach the port |
When inbound is set, Pai provisions a public load balancer and exposes the port at a stable IP. The agent's public URL is always available at status.url regardless of inbound config.
Storage
Persistent volumes survive pod restarts:
volumes:
- name: workspace # becomes the PVC name
mountPath: /data # where it's mounted inside the container
size: "10Gi"
| Field | Type | Description |
|---|---|---|
volumes[].name | string | Volume name (used as PVC identifier) |
volumes[].mountPath | string | Mount path inside the container |
volumes[].size | string | Storage size (e.g. "1Gi", "50Gi") |
Config files
Inject files into the container at startup via a ConfigMap. Files are seeded into volumes on first start using copy-no-clobber — existing files are not overwritten on restart, so user edits survive pod restarts.
configFiles:
- path: /home/node/.myapp/config.json
content: |
{ "setting": "value" }
Environment variables
env:
- name: LOG_LEVEL
value: "debug"
- name: API_ENDPOINT
value: "https://api.example.com"
Personal env vars (git author name, email, etc.) can be set once with pai config set-env and are automatically injected into every agent you create.
Resources
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
| Field | Description |
|---|---|
requests.cpu | Guaranteed CPU (e.g. "500m" = 0.5 cores) |
requests.memory | Guaranteed memory (e.g. "512Mi") |
limits.cpu | CPU ceiling — agent is throttled if exceeded |
limits.memory | Memory ceiling — agent is OOM-killed if exceeded |
Token budgets
Override the default token limits for this agent:
tokens:
maxPerDay: 50000
maxPerRequest: 8192
Expose directories
Serve files from the agent's filesystem at a public URL path:
expose:
- urlPath: /reports
directory: /data/reports
https://<hostname>.pairun.dev/reports will serve the contents of /data/reports inside the container.
Autoscaling
Automatically scale replicas up and down based on traffic or queue depth:
autoscaling:
minReplicas: 1
maxReplicas: 5
scaleUpCooldownSeconds: 60
scaleDownCooldownSeconds: 300
metrics:
- type: tokenRate
targetValuePerReplica: 500 # scale up at 500 tokens/min per replica
| Field | Default | Description |
|---|---|---|
minReplicas | 1 | Minimum replica count (0 = pause the agent) |
maxReplicas | — | Maximum replica count (required) |
scaleUpCooldownSeconds | 60 | Minimum seconds between scale-up events |
scaleDownCooldownSeconds | 300 | Minimum seconds between scale-down events |
scaleDownStabilizationWindowSeconds | 120 | Rolling window before allowing a scale-down |
pollIntervalSeconds | 30 | How often metrics are evaluated |
Metric types:
| Type | Description |
|---|---|
tokenRate | Tokens per minute observed by the gateway |
http | Poll any URL returning a JSON number (JIRA, SQS, custom queues) |
metrics:
- type: http
url: "https://your-queue.example.com/depth"
jsonPath: "queue.depth"
targetValuePerReplica: 10
View live scaling status:
pai scaling my-agent
pai scaling my-agent --watch
Browser automation (CDP relay)
Give the agent access to a Chrome browser running on your machine:
cdpRelay:
token: "my-secret-token"
Then run pai relay my-agent --token my-secret-token on your machine. See the Browser Automation guide for full setup instructions.
Filesystem confinement
Restrict which paths the agent can write to (Linux kernel 5.13+ only, best-effort):
filesystem:
readOnlyPaths:
- /etc/myapp/config.json
writablePaths:
- /home/node/.npm
| Field | Description |
|---|---|
readOnlyPaths | Paths the agent cannot write to (read is always allowed) |
writablePaths | Extra writable paths beyond volumes and /tmp |
Operational runbook
Plain-English instructions for pai chat to use when querying agent status:
ops:
instructions: |
Run `myapp status` to check health.
Logs are at /data/logs/app.log.
Status fields
| Field | Description |
|---|---|
status.phase | Creating · Pending · Running · Failed · Terminating |
status.url | Public HTTPS URL (e.g. https://a7x3k9.pairun.dev) |
status.tokensToday | Token consumption for the current day |
status.message | Failure reason when phase is Failed |
status.currentReplicas | Current replica count (when autoscaling is active) |
status.lastScaleTime | Timestamp of the last autoscaler action |