Screenshots
Pai includes a built-in screenshot tool that lets agents render any URL in a real headless Chromium browser. The PNG can be:
- Returned inline to the model's next turn — the agent literally sees the page and can answer "does the hero contain the new copy?", "is the checkout button red?", etc.
- Saved to disk at a path inside the agent's writable volumes
- Attached to an email using the platform email service
The platform runs Chromium centrally in a pai-screenshot service so individual agent images stay lean. No per-agent configuration is required.
Quick start
apiVersion: pai.io/v1
kind: Agent
metadata:
name: visual-checker
spec:
models:
- anthropic/claude-sonnet-4-6
system: |
You are a visual QA agent. When given a URL, take a screenshot, then
describe what's on the page and flag anything that looks broken.
The screenshot tool is enabled by default — no tools: block needed.
Tool reference
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Full http(s):// URL to render |
save_path | string | No | Absolute path inside the container to save the PNG. Must point at a writable volume or /tmp |
email_to | string | No | When set, the PNG is attached to an email sent via the platform email service |
email_subject | string | No | Email subject (defaults to Screenshot of <url>) |
email_body | string | No | Plain-text email body |
return_image | boolean | No | Inline the PNG in the tool result so the model can reason about it. Default true. Set false for save/email-only flows to save vision tokens |
full_page | boolean | No | Capture the entire scrollable page rather than just the viewport. Default false |
viewport_width | integer | No | Browser viewport width in pixels |
viewport_height | integer | No | Browser viewport height in pixels |
wait_for | string | No | Either a load state (load, domcontentloaded, networkidle) or a CSS selector to wait for before capturing |
Inline images are automatically downsampled to ≤1568 px on the long edge — the same size Anthropic resizes to internally — so vision tokens aren't wasted.
Example: visual verification
apiVersion: pai.io/v1
kind: Agent
metadata:
name: marketing-checker
spec:
type: task
agentDefinition: visual-checker
title: |
Take a screenshot of https://pairun.dev and confirm whether the hero
headline includes the word "Agentic". If it doesn't, send an email to
marketing@example.com flagging the issue with the screenshot attached.
The agent will:
- Call
screenshot(url=…)— receives the PNG inline and reads it. - Decide whether the headline is correct.
- If not, call
screenshot(url=…, email_to="marketing@example.com", email_body=…)— the same tool sends the email with the PNG attached, all in one step.
Example: archival pipeline
spec:
type: task
agentDefinition: visual-checker
title: |
Capture full-page screenshots of these URLs and save each one to
/work/archive/<host>.png:
- https://pairun.dev
- https://docs.pairun.dev
volumes:
- name: work
mountPath: /work
size: 1Gi
The agent uses screenshot(url=…, save_path="/work/archive/pairun-dev.png", full_page=true, return_image=false) so the PNG is written without the model burning vision tokens it doesn't need.
Disable screenshot for an agent
tools:
- type: screenshot
enabled: false
Limits
| Limit | Default | Override (Helm) |
|---|---|---|
| Captures per workload per day | 200 | screenshot.dailyLimit |
| Default viewport | 1280 × 800 | screenshot.viewport.width / screenshot.viewport.height |
| Per-capture timeout | 30 s | screenshot.timeoutMs |
| Allowed hosts | any public URL | screenshot.allowedHosts (comma-separated) |
Deployment
The pai-screenshot service is disabled by default. Enable in your prod values:
# values-prod.yaml
screenshot:
enabled: true
It runs as a single replica of the upstream mcr.microsoft.com/playwright/python image, requesting 200m CPU / 512Mi memory (limits 1 CPU / 1Gi memory) — comfortable for a single Chromium. Bump replicas if you expect concurrent captures from many agents.
What it doesn't do (yet)
- No PDF rendering or DOM dumps. Add later under the same service if needed.
- No authenticated capture. The service renders public URLs only — no cookie injection or login. If you need to screenshot something behind auth, that's a separate Provider plumbing job.