Skip to main content

Architecture

Pai is built on Kubernetes and uses a two-tier proxy model to mediate all agent traffic. This page covers the core architecture, credential flow, DNS interception, TLS strategy, and security model.

High-level overview

Two-tier proxy model

All agent traffic is mediated by two proxy layers. Agents never call external services directly.

Tier 1: LLM Gateway

The LLM Gateway is a cluster-wide service that proxies all agent-to-LLM communication.

  • Exposes an OpenAI-compatible API to agents
  • Routes requests to the correct provider (Anthropic, OpenAI, Gemini, OpenRouter, vLLM, or any OpenAI-compatible endpoint) based on the ModelProvider
  • Injects API keys from Kubernetes Secrets -- agents never see provider credentials
  • Enforces token budgets (per-day and per-request limits)
  • Tracks cost and usage per workload
  • Retries transient upstream failures (429, 5xx, connection errors) with exponential backoff + jitter, then walks the fallback chain (ModelProvider.spec.fallbacks) when retries exhaust. See reliability
  • Enforces rate + concurrency limits (Agent.spec.rateLimits) — RPM, simultaneous in-flight requests, daily USD spend cap
  • Runs per-agent prompt-injection / jailbreak scans via GuardBindings before forwarding to the provider. Classifier calls go out to the pai-guard service (or any custom HTTP endpoint). Fail-open: a broken classifier never blocks traffic

Tier 2: Service Binding Proxy (Sidecar)

Each agent pod includes a sidecar proxy that intercepts all calls to external services (GitHub, AWS, Azure, GCP, Telegram, Slack, etc.).

  • Runs as a sidecar container in every agent pod
  • Intercepts HTTPS traffic via DNS hijacking and iptables redirect
  • Routes requests to the correct provider plugin based on the Host header
  • Resolves actions (e.g., GET /repos/org/repo/pulls maps to pulls:read)
  • Evaluates policy (allow/deny lists) before forwarding
  • Injects credentials (Bearer tokens, SigV4 signatures, OAuth2 tokens)
  • Logs every request for audit
Agent Container
|
|-- LLM calls --> PAI Gateway (port 8000) --> Anthropic / OpenAI / Gemini
|
|-- Tool/service calls --> Binding Proxy Sidecar (port 8081 HTTP, 8443 HTTPS)
|
|-- 1. Route: Host header --> PluginRegistry --> ProviderPlugin
|-- 2. HTTP rules: check policy.httpRules (method + path globs)
|-- 3. Resolve: plugin.resolve_action(method, path) --> "pulls:create"
|-- 4. Policy: check allow/deny action lists
|-- 5. Scope: check resource scope (repos, ARNs, projects)
|-- 6. Audit: log(workload, binding, action, resource, allowed)
|-- 7. Auth: plugin.inject_credentials(request)
|-- 8. Forward: proxy to real endpoint

Gateway state store

The gateway uses a single in-cluster Redis pod to hold state that must survive pod restarts and be consistent across multiple gateway replicas:

  • Daily token counters (tokens:{workload}:{YYYYMMDD})
  • Cumulative USD cost (cost:{workload}:{YYYYMMDD}, stored as micro-dollars)
  • Per-minute RPM windows (rpm:{workload}:{min_epoch})
  • Concurrent-request counters (conc:{workload})

The Redis pod is intentionally lean: 20m CPU / 32Mi memory requests, emptyDir storage, allkeys-lru eviction, 24mb maxmemory. One replica handles ~250k active counters.

Fail-open by design. When Redis is unreachable (outage, network partition, or redis.enabled: false in the helm chart), the gateway logs a single warning and transparently falls back to per-pod in-memory counters. No request is ever blocked on Redis health. The tradeoffs of memory mode:

  • Counters reset on gateway pod restart (one window's worth of leakage for RPM, potential re-allowance of daily budget)
  • Counters are per-pod, so multi-replica gateway deployments won't share state (enforce via sessionAffinity: ClientIP on the gateway Service if you need sticky routing)

To disable Redis entirely (single-replica deployments, dev clusters):

# values.yaml
redis:
enabled: false

Prompt-injection guard layer

A third, optional layer sits inside the LLM Gateway: a classifier pipeline that scans agent requests for prompt-injection and jailbreak attempts before they reach the model. Agents opt in per-workload via spec.guards[], and operators choose what to scan (user prompts, assistant responses, tool results from specific tools).

  • The GuardBinding CRD points at a classifier HTTP endpoint. The shipped pai-guard service runs ProtectAI DeBERTa-v3 by default, but the endpoint is pluggable — you can point it at Llama-Guard, NeMo Guardrails, Lakera, Protect AI, or a home-grown classifier.
  • The controller resolves each agent's spec.guards[] against the referenced GuardBindings and writes a pai-{workload}-guards ConfigMap. The gateway polls these ConfigMaps cluster-wide and hot-reloads policy without a pod restart — same pattern as provider bindings.
  • Enforcement has two modes: audit (log violations, forward the request) and enforce (return HTTP 403). Per-agent enforcement can only tighten — agent authors can turn audit into enforce, but can never relax an enforced binding.
  • Violations land in the gateway's tamper-evident audit chain as GUARD.VIOLATION_ENFORCE / GUARD.VIOLATION_AUDIT events, with a sanitized copy of the flagged payload (emails, phone numbers, API keys, credit cards, JWTs, and long hex/base64 blobs replaced with [REDACTED:<kind>] tags). Classifier failures emit GUARD.UNAVAILABLE events — fail-open by design.
  • pai audit <agent> aggregates guard events into the per-agent view alongside SERVICE_CALL, LLM_CALL, and TOOL_CALL entries. See the pai audit CLI reference for filtering options.

See the prompt-injection guard guide for the full rollout workflow.

Credential flow

Agents never hold real credentials. The credential flow works as follows:

Key points:

  • Kubernetes Secrets hold the raw credentials (PATs, AWS keys, OAuth client secrets, GCP service account JSON)
  • The Controller injects secrets into the sidecar container only -- never into the agent container
  • OAuth2 providers (Azure, GCP) perform token exchange at startup and refresh tokens in the background
  • AWS SigV4 signing is computed per-request by the plugin (pure Python, no boto3)
  • The agent receives responses but never sees the auth headers

DNS interception

Pai uses DNS hijacking to transparently route agent traffic through the sidecar proxy.

How it works:

  1. The Controller adds hostAliases entries to the pod spec, pointing intercepted hostnames (e.g., api.github.com, generativelanguage.googleapis.com) to 127.0.0.1.
  2. An init container sets up iptables NAT rules to redirect port 443 traffic to the sidecar's HTTPS port (8443).
  3. The sidecar terminates TLS using a self-signed CA certificate, inspects the request, applies policy, injects credentials, and forwards to the real endpoint.

TLS strategy

Pai uses two layers of TLS:

External TLS (inbound traffic)

  • Agent workloads with inbound.port configured get a unique hostname on the pairun.dev domain (e.g., a7x3k9.pairun.dev)
  • Pai creates a cert-manager Certificate resource for the hostname
  • cert-manager obtains a trusted TLS certificate from Let's Encrypt via DNS-01 challenge
  • External clients connect over HTTPS with a valid, publicly trusted certificate

Internal TLS (interception)

  • The sidecar generates a self-signed CA certificate at startup via entrypoint.sh
  • The CA cert includes SANs for all intercepted hosts (e.g., api.github.com, s3.amazonaws.com)
  • The CA cert is shared with the agent container via an emptyDir volume
  • Standard environment variables are set to make the agent trust the CA:
    • REQUESTS_CA_BUNDLE (Python requests)
    • SSL_CERT_FILE (generic)
    • NODE_EXTRA_CA_CERTS (Node.js)
    • GIT_SSL_CAINFO (Git)

DNS: auto-generated hostnames

When an agent workload declares an inbound.port, Pai:

  1. Generates a random 6-character hostname (e.g., a7x3k9)
  2. Creates a DNS record at a7x3k9.pairun.dev pointing to the load balancer
  3. Provisions a TLS certificate for the hostname
  4. Sets status.url on the Agent to https://a7x3k9.pairun.dev

The hostname is stable for the lifetime of the workload. Deleting and recreating the workload generates a new hostname.

Security model

Pai enforces defense-in-depth at every layer:

Pod-level hardening

Every agent pod receives the following controls automatically — no Agent configuration required:

ControlImplementationEffect
No service account tokenautomountServiceAccountToken: falsePod cannot call the Kubernetes API
Non-root executionrunAsNonRoot: true, default UID 65532Agent cannot run as root
No privilege escalationallowPrivilegeEscalation: falseNo setuid/sudo escalation
Syscall filteringseccompProfile: RuntimeDefaultBlocks ~300 dangerous syscalls (ptrace, mount, kexec, bpf, etc.)
Root UID rejectionController validates runAsUser != 0 at reconcile timespec.runAsUser: 0 sets status.phase: Failed and skips deployment
Filesystem confinementLandlock LSM via spec.filesystem (opt-in, kernel 5.13+)Per-path write restrictions; agent cannot overwrite config or install cron jobs

Filesystem confinement (Landlock)

When spec.filesystem.readOnlyPaths is set, Pai injects a Landlock LSM enforcer via LD_PRELOAD — no entrypoint change required:

  1. An init container copies pai-landlock.so from the proxy image into a shared pai-sandbox emptyDir volume.
  2. The controller sets LD_PRELOAD=/pai-sandbox/pai-landlock.so and PAI_LANDLOCK_RW=<writable paths> on the agent container.
  3. The dynamic linker loads pai-landlock.so before main(). Its constructor reads PAI_LANDLOCK_RW, applies Landlock, and returns. All child processes inherit the restrictions.

Algorithm:

  • Handled mask = ALL write rights (WRITE_FILE, REMOVE_*, MAKE_*, TRUNCATE, REFER)
  • Declared writable paths (spec.volumes mountPaths + /tmp + spec.filesystem.writablePaths) receive explicit write grants
  • spec.filesystem.readOnlyPaths receive NO write grant → kernel silently denies writes
  • READ is outside the handled mask → reading is always unrestricted everywhere

ABI probing: v3 (kernel 5.19+) → v2 (5.17+) → v1 (5.13+). If Landlock is unavailable, the wrapper logs a warning and execs the original command — startup is never blocked.

Two-tier enforcement (automatic fallback):

TierMechanismWhen activeBypass resistance
1Landlock LSMKernel 5.13+ with Landlock built inKernel-level, resists direct syscalls
2libc interceptionAll other kernels (automatic fallback)Covers libc callers (Node.js, Python, JVM); bypassed by direct syscalls

Requirements:

  • Tier 1: Linux kernel 5.13+ with Landlock compiled in (e.g. Ubuntu 22.04+, k3s; not GKE/COS)
  • Tier 2: Dynamically linked runtime — works on any kernel
Choosing paths to protect

Only list paths the application itself never writes to in readOnlyPaths. If the app writes to a file at startup (config persistence, atomic renames, etc.), protecting it will crash the agent. Good candidates: system directories like /etc/cron.d, /var/spool/cron, /etc/passwd.

Network isolation

  • A NetworkPolicy restricts agent egress to only the Pai gateway and the sidecar
  • The agent cannot reach the Kubernetes API, other pods, or the internet directly
  • All external access is mediated by the sidecar, which enforces per-binding policy
  • Inbound traffic (when configured) is restricted to specified CIDR blocks via loadBalancerSourceRanges and NetworkPolicy ipBlock rules

Credential isolation

  • Secrets are mounted only in the sidecar container
  • The agent container has no access to secrets via environment variables, volumes, or the Kubernetes API
  • OAuth2 tokens are held in memory by the sidecar and refreshed automatically
  • AWS SigV4 signatures are computed per-request and never exposed to the agent

Policy enforcement modes

Providers support two enforcement modes via spec.audit.enforcement:

  • enforce (default): Requests that violate policy.allow/policy.deny are blocked with HTTP 403.
  • audit: Violations are logged with a AUDIT (not blocked): prefix but the request is forwarded. Use this when rolling out new policies to production agents — validate the policy against live traffic, then switch to enforce once confident.

The enforcement mode is evaluated per binding independently, so you can audit one binding while enforcing others.

Policy hot-reload

Updating a Provider (policy rules, httpRules, audit settings) takes effect on running agents without a pod restart.

How it works:

  1. When a Provider changes, the controller's watcher triggers a reconcile for all affected Agent resources.
  2. The controller writes the updated provider specs to a ConfigMap named pai-{workload}-providers.
  3. The Kubernetes kubelet automatically syncs the ConfigMap to the pod's volume mount (typically within ~1 minute).
  4. The sidecar proxy's background watcher thread (polling every 30 seconds) detects the file's mtime change and reloads _bindings in-place under a threading lock.
  5. The next request through that binding uses the updated policy.

What triggers a reload vs. a restart:

ChangeHot-reloadRestart required
policy.allow / policy.deny
policy.httpRules
audit.logRequests / audit.enforcement
scope.*
auth.secretRef (new credentials)✅ (secrets are env vars)
spec.providers list change✅ (changes sidecar config)

Agent harness (session mode)

When a Session is created, the controller spawns a K8s Job running the Pai agent harness (platform/harness/). The harness is the runtime for image-free agents — it provides the agent loop, built-in tools, and the event stream server.

Session Job Pod
├── harness container (pai-harness:latest)
│ ├── Agent loop (Anthropic SDK → Pai LLM Gateway)
│ ├── Built-in tools: bash, read, write, edit, glob, grep, web_fetch, web_search
│ ├── SSE event stream: GET /stream (port 8091)
│ └── Inbound events: POST /events (port 8091)
└── sidecar container (same provider proxy as service Agents)
└── Credential injection + policy enforcement

Startup sequence:

  1. Init container pai-packages-install installs spec.packages into /opt/pai-packages/
  2. Harness reads PAI_AGENT_DEFINITION (JSON) and PAI_SESSION_TITLE
  3. If PAI_SESSION_TITLE is set, the agent loop starts immediately with the title as the prompt
  4. Otherwise, the harness emits session.status_idle and waits for a user.message event

Event persistence:

All events emitted by the harness are periodically flushed to a ConfigMap named pai-session-{name}-events (key: events.jsonl, JSONL format). This ConfigMap is created by the controller before the Job starts and persists after pod termination — so the full event history is available via GET /sessions/{name}/events even after the session completes.

Custom tools:

When the model calls a custom tool (defined in an Agent's spec.tools[].type: custom), the harness:

  1. Emits agent.custom_tool_use on the event stream and blocks
  2. Waits for the caller to send a user.custom_tool_result event to POST /events
  3. Injects the result into the conversation and resumes

Provider plugin system

External service integrations are implemented as self-contained plugins in platform/proxy/providers/. Each plugin implements the ProviderPlugin abstract base class.

MethodPurpose
provider_name()Canonical name (e.g., "github")
default_hosts()Hostnames to intercept (e.g., ["api.github.com"])
host_patterns()Wildcard patterns (e.g., ["*.amazonaws.com"])
resolve_action()Map HTTP request to a provider-specific action
inject_credentials()Add auth headers before forwarding
start() / stop()Lifecycle hooks (token exchange, cleanup)
refresh_token()Background token refresh (Azure, GCP)

Supported providers:

ProviderAuth MethodToken Refresh
GitHubPAT / GitHub AppNo (static)
AWSSigV4 (per-request HMAC signing)No (signed per-request)
AzureOAuth2 client credentials (Entra ID)Yes (every ~50 min)
GCPService account JWT to OAuth2Yes (every ~45 min)
TelegramBot tokenNo (static)
SlackOAuth2 / API keyVaries

Adding a new provider requires creating a single file in platform/proxy/providers/ -- the registry auto-discovers it at startup.