Architecture

Pai is built on Kubernetes and uses a two-tier proxy model to mediate all agent traffic. This page covers the core architecture, credential flow, DNS interception, TLS strategy, and security model.

High-level overview

Two-tier proxy model

All agent traffic is mediated by two proxy layers. Agents never call external services directly.

Tier 1: LLM Gateway

The LLM Gateway is a cluster-wide service that proxies all agent-to-LLM communication.

Exposes an OpenAI-compatible API to agents
Routes requests to the correct provider (Anthropic, OpenAI, Gemini, OpenRouter, vLLM, or any OpenAI-compatible endpoint) based on the ModelProvider
Injects API keys from Kubernetes Secrets -- agents never see provider credentials
Enforces token budgets (per-day and per-request limits)
Tracks cost and usage per workload
Retries transient upstream failures (429, 5xx, connection errors) with exponential backoff + jitter, then walks the fallback chain (ModelProvider.spec.fallbacks) when retries exhaust. See reliability
Enforces rate + concurrency limits (Agent.spec.rateLimits) — RPM, simultaneous in-flight requests, daily USD spend cap
Runs per-agent prompt-injection / jailbreak scans via GuardBindings before forwarding to the provider. Classifier calls go out to the pai-guard service (or any custom HTTP endpoint). Fail-open: a broken classifier never blocks traffic

Tier 2: Service Binding Proxy (Sidecar)

Each agent pod includes a sidecar proxy that intercepts all calls to external services (GitHub, AWS, Azure, GCP, Telegram, Slack, etc.).

Runs as a sidecar container in every agent pod
Intercepts HTTPS traffic via DNS hijacking and iptables redirect
Routes requests to the correct provider plugin based on the Host header
Resolves actions (e.g., GET /repos/org/repo/pulls maps to pulls:read)
Evaluates policy (allow/deny lists) before forwarding
Injects credentials (Bearer tokens, SigV4 signatures, OAuth2 tokens)
Logs every request for audit

Agent Container
    |
    |-- LLM calls --> PAI Gateway (port 8000) --> Anthropic / OpenAI / Gemini
    |
    |-- Tool/service calls --> Binding Proxy Sidecar (port 8081 HTTP, 8443 HTTPS)
                                |
                                |-- 1. Route:    Host header --> PluginRegistry --> ProviderPlugin
                                |-- 2. HTTP rules: check policy.httpRules (method + path globs)
                                |-- 3. Resolve:  plugin.resolve_action(method, path) --> "pulls:create"
                                |-- 4. Policy:   check allow/deny action lists
                                |-- 5. Scope:    check resource scope (repos, ARNs, projects)
                                |-- 6. Audit:    log(workload, binding, action, resource, allowed)
                                |-- 7. Auth:     plugin.inject_credentials(request)
                                |-- 8. Forward:  proxy to real endpoint

Gateway state store

The gateway uses a single in-cluster Redis pod to hold state that must survive pod restarts and be consistent across multiple gateway replicas:

Daily token counters (tokens:{workload}:{YYYYMMDD})
Cumulative USD cost (cost:{workload}:{YYYYMMDD}, stored as micro-dollars)
Per-minute RPM windows (rpm:{workload}:{min_epoch})
Concurrent-request counters (conc:{workload})

The Redis pod is intentionally lean: 20m CPU / 32Mi memory requests, emptyDir storage, allkeys-lru eviction, 24mb maxmemory. One replica handles ~250k active counters.

Fail-open by design. When Redis is unreachable (outage, network partition, or redis.enabled: false in the helm chart), the gateway logs a single warning and transparently falls back to per-pod in-memory counters. No request is ever blocked on Redis health. The tradeoffs of memory mode:

Counters reset on gateway pod restart (one window's worth of leakage for RPM, potential re-allowance of daily budget)
Counters are per-pod, so multi-replica gateway deployments won't share state (enforce via sessionAffinity: ClientIP on the gateway Service if you need sticky routing)

To disable Redis entirely (single-replica deployments, dev clusters):

# values.yaml
redis:
  enabled: false

Prompt-injection guard layer

A third, optional layer sits inside the LLM Gateway: a classifier pipeline that scans agent requests for prompt-injection and jailbreak attempts before they reach the model. Agents opt in per-workload via spec.guards[], and operators choose what to scan (user prompts, assistant responses, tool results from specific tools).

The GuardBinding CRD points at a classifier HTTP endpoint. The shipped pai-guard service runs ProtectAI DeBERTa-v3 by default, but the endpoint is pluggable — you can point it at Llama-Guard, NeMo Guardrails, Lakera, Protect AI, or a home-grown classifier.
The controller resolves each agent's spec.guards[] against the referenced GuardBindings and writes a pai-{workload}-guards ConfigMap. The gateway polls these ConfigMaps cluster-wide and hot-reloads policy without a pod restart — same pattern as provider bindings.
Enforcement has two modes: audit (log violations, forward the request) and enforce (return HTTP 403). Per-agent enforcement can only tighten — agent authors can turn audit into enforce, but can never relax an enforced binding.
Violations land in the gateway's tamper-evident audit chain as GUARD.VIOLATION_ENFORCE / GUARD.VIOLATION_AUDIT events, with a sanitized copy of the flagged payload (emails, phone numbers, API keys, credit cards, JWTs, and long hex/base64 blobs replaced with [REDACTED:<kind>] tags). Classifier failures emit GUARD.UNAVAILABLE events — fail-open by design.
pai audit <agent> aggregates guard events into the per-agent view alongside SERVICE_CALL, LLM_CALL, and TOOL_CALL entries. See the pai audit CLI reference for filtering options.

See the prompt-injection guard guide for the full rollout workflow.

Credential flow

Agents never hold real credentials. The credential flow works as follows:

Key points:

Kubernetes Secrets hold the raw credentials (PATs, AWS keys, OAuth client secrets, GCP service account JSON)
The Controller injects secrets into the sidecar container only -- never into the agent container
OAuth2 providers (Azure, GCP) perform token exchange at startup and refresh tokens in the background
AWS SigV4 signing is computed per-request by the plugin (pure Python, no boto3)
The agent receives responses but never sees the auth headers

DNS interception

Pai uses DNS hijacking to transparently route agent traffic through the sidecar proxy.

How it works:

The Controller adds hostAliases entries to the pod spec, pointing intercepted hostnames (e.g., api.github.com, generativelanguage.googleapis.com) to 127.0.0.1.
An init container sets up iptables NAT rules to redirect port 443 traffic to the sidecar's HTTPS port (8443).
The sidecar terminates TLS using a self-signed CA certificate, inspects the request, applies policy, injects credentials, and forwards to the real endpoint.

TLS strategy

Pai uses two layers of TLS:

External TLS (inbound traffic)

Agent workloads with inbound.port configured get a unique hostname on the pairun.dev domain (e.g., a7x3k9.pairun.dev)
Pai creates a cert-manager Certificate resource for the hostname
cert-manager obtains a trusted TLS certificate from Let's Encrypt via DNS-01 challenge
External clients connect over HTTPS with a valid, publicly trusted certificate

Internal TLS (interception)

The sidecar generates a self-signed CA certificate at startup via entrypoint.sh
The CA cert includes SANs for all intercepted hosts (e.g., api.github.com, s3.amazonaws.com)
The CA cert is shared with the agent container via an emptyDir volume
Standard environment variables are set to make the agent trust the CA:
- REQUESTS_CA_BUNDLE (Python requests)
- SSL_CERT_FILE (generic)
- NODE_EXTRA_CA_CERTS (Node.js)
- GIT_SSL_CAINFO (Git)

DNS: auto-generated hostnames

When an agent workload declares an inbound.port, Pai:

Generates a random 6-character hostname (e.g., a7x3k9)
Creates a DNS record at a7x3k9.pairun.dev pointing to the load balancer
Provisions a TLS certificate for the hostname
Sets status.url on the Agent to https://a7x3k9.pairun.dev

The hostname is stable for the lifetime of the workload. Deleting and recreating the workload generates a new hostname.

Security model

Pai enforces defense-in-depth at every layer:

Pod-level hardening

Every agent pod receives the following controls automatically — no Agent configuration required:

Control	Implementation	Effect
No service account token	`automountServiceAccountToken: false`	Pod cannot call the Kubernetes API
Non-root execution	`runAsNonRoot: true`, default UID 65532	Agent cannot run as root
No privilege escalation	`allowPrivilegeEscalation: false`	No `setuid`/`sudo` escalation
Syscall filtering	`seccompProfile: RuntimeDefault`	Blocks ~300 dangerous syscalls (ptrace, mount, kexec, bpf, etc.)
Root UID rejection	Controller validates `runAsUser != 0` at reconcile time	`spec.runAsUser: 0` sets `status.phase: Failed` and skips deployment
Filesystem confinement	Landlock LSM via `spec.filesystem` (opt-in, kernel 5.13+)	Per-path write restrictions; agent cannot overwrite config or install cron jobs

Filesystem confinement (Landlock)

When spec.filesystem.readOnlyPaths is set, Pai injects a Landlock LSM enforcer via LD_PRELOAD — no entrypoint change required:

An init container copies pai-landlock.so from the proxy image into a shared pai-sandbox emptyDir volume.
The controller sets LD_PRELOAD=/pai-sandbox/pai-landlock.so and PAI_LANDLOCK_RW=<writable paths> on the agent container.
The dynamic linker loads pai-landlock.so before main(). Its constructor reads PAI_LANDLOCK_RW, applies Landlock, and returns. All child processes inherit the restrictions.

Algorithm:

Handled mask = ALL write rights (WRITE_FILE, REMOVE_*, MAKE_*, TRUNCATE, REFER)
Declared writable paths (spec.volumes mountPaths + /tmp + spec.filesystem.writablePaths) receive explicit write grants
spec.filesystem.readOnlyPaths receive NO write grant → kernel silently denies writes
READ is outside the handled mask → reading is always unrestricted everywhere

ABI probing: v3 (kernel 5.19+) → v2 (5.17+) → v1 (5.13+). If Landlock is unavailable, the wrapper logs a warning and execs the original command — startup is never blocked.

Two-tier enforcement (automatic fallback):

Tier	Mechanism	When active	Bypass resistance
1	Landlock LSM	Kernel 5.13+ with Landlock built in	Kernel-level, resists direct syscalls
2	libc interception	All other kernels (automatic fallback)	Covers libc callers (Node.js, Python, JVM); bypassed by direct syscalls

Requirements:

Tier 1: Linux kernel 5.13+ with Landlock compiled in (e.g. Ubuntu 22.04+, k3s; not GKE/COS)
Tier 2: Dynamically linked runtime — works on any kernel

Choosing paths to protect

Only list paths the application itself never writes to in readOnlyPaths. If the app writes to a file at startup (config persistence, atomic renames, etc.), protecting it will crash the agent. Good candidates: system directories like /etc/cron.d, /var/spool/cron, /etc/passwd.

Network isolation

A NetworkPolicy restricts agent egress to only the Pai gateway and the sidecar
The agent cannot reach the Kubernetes API, other pods, or the internet directly
All external access is mediated by the sidecar, which enforces per-binding policy
Inbound traffic (when configured) is restricted to specified CIDR blocks via loadBalancerSourceRanges and NetworkPolicy ipBlock rules

Credential isolation

Secrets are mounted only in the sidecar container
The agent container has no access to secrets via environment variables, volumes, or the Kubernetes API
OAuth2 tokens are held in memory by the sidecar and refreshed automatically
AWS SigV4 signatures are computed per-request and never exposed to the agent

Policy enforcement modes

Providers support two enforcement modes via spec.audit.enforcement:

enforce (default): Requests that violate policy.allow/policy.deny are blocked with HTTP 403.
audit: Violations are logged with a AUDIT (not blocked): prefix but the request is forwarded. Use this when rolling out new policies to production agents — validate the policy against live traffic, then switch to enforce once confident.

The enforcement mode is evaluated per binding independently, so you can audit one binding while enforcing others.

Policy hot-reload

Updating a Provider (policy rules, httpRules, audit settings) takes effect on running agents without a pod restart.

How it works:

When a Provider changes, the controller's watcher triggers a reconcile for all affected Agent resources.
The controller writes the updated provider specs to a ConfigMap named pai-{workload}-providers.
The Kubernetes kubelet automatically syncs the ConfigMap to the pod's volume mount (typically within ~1 minute).
The sidecar proxy's background watcher thread (polling every 30 seconds) detects the file's mtime change and reloads _bindings in-place under a threading lock.
The next request through that binding uses the updated policy.

What triggers a reload vs. a restart:

Change	Hot-reload	Restart required
`policy.allow` / `policy.deny`	✅	—
`policy.httpRules`	✅	—
`audit.logRequests` / `audit.enforcement`	✅	—
`scope.*`	✅	—
`auth.secretRef` (new credentials)	—	✅ (secrets are env vars)
`spec.providers` list change	—	✅ (changes sidecar config)

Agent harness (session mode)

When a Session is created, the controller spawns a K8s Job running the Pai agent harness (platform/harness/). The harness is the runtime for image-free agents — it provides the agent loop, built-in tools, and the event stream server.

Session Job Pod
  ├── harness container  (pai-harness:latest)
  │     ├── Agent loop (Anthropic SDK → Pai LLM Gateway)
  │     ├── Built-in tools: bash, read, write, edit, glob, grep, web_fetch, web_search
  │     ├── SSE event stream: GET /stream       (port 8091)
  │     └── Inbound events:  POST /events       (port 8091)
  └── sidecar container  (same provider proxy as service Agents)
        └── Credential injection + policy enforcement

Startup sequence:

Init container pai-packages-install installs spec.packages into /opt/pai-packages/
Harness reads PAI_AGENT_DEFINITION (JSON) and PAI_SESSION_TITLE
If PAI_SESSION_TITLE is set, the agent loop starts immediately with the title as the prompt
Otherwise, the harness emits session.status_idle and waits for a user.message event

Event persistence:

All events emitted by the harness are periodically flushed to a ConfigMap named pai-session-{name}-events (key: events.jsonl, JSONL format). This ConfigMap is created by the controller before the Job starts and persists after pod termination — so the full event history is available via GET /sessions/{name}/events even after the session completes.

Custom tools:

When the model calls a custom tool (defined in an Agent's spec.tools[].type: custom), the harness:

Emits agent.custom_tool_use on the event stream and blocks
Waits for the caller to send a user.custom_tool_result event to POST /events
Injects the result into the conversation and resumes

Provider plugin system

External service integrations are implemented as self-contained plugins in platform/proxy/providers/. Each plugin implements the ProviderPlugin abstract base class.

Method	Purpose
`provider_name()`	Canonical name (e.g., `"github"`)
`default_hosts()`	Hostnames to intercept (e.g., `["api.github.com"]`)
`host_patterns()`	Wildcard patterns (e.g., `["*.amazonaws.com"]`)
`resolve_action()`	Map HTTP request to a provider-specific action
`inject_credentials()`	Add auth headers before forwarding
`start()` / `stop()`	Lifecycle hooks (token exchange, cleanup)
`refresh_token()`	Background token refresh (Azure, GCP)

Supported providers:

Provider	Auth Method	Token Refresh
GitHub	PAT / GitHub App	No (static)
AWS	SigV4 (per-request HMAC signing)	No (signed per-request)
Azure	OAuth2 client credentials (Entra ID)	Yes (every ~50 min)
GCP	Service account JWT to OAuth2	Yes (every ~45 min)
Telegram	Bot token	No (static)
Slack	OAuth2 / API key	Varies

Adding a new provider requires creating a single file in platform/proxy/providers/ -- the registry auto-discovers it at startup.

High-level overview​

Two-tier proxy model​

Tier 1: LLM Gateway​

Tier 2: Service Binding Proxy (Sidecar)​

Gateway state store​

Prompt-injection guard layer​

Credential flow​

DNS interception​

TLS strategy​

External TLS (inbound traffic)​

Internal TLS (interception)​

DNS: auto-generated hostnames​

Security model​

Pod-level hardening​

Filesystem confinement (Landlock)​

Network isolation​

Credential isolation​

Policy enforcement modes​

Policy hot-reload​

Agent harness (session mode)​

Provider plugin system​