Skip to main content

Screenshots

Pai includes a built-in screenshot tool that lets agents render any URL in a real headless Chromium browser. The PNG can be:

  • Returned inline to the model's next turn — the agent literally sees the page and can answer "does the hero contain the new copy?", "is the checkout button red?", etc.
  • Saved to disk at a path inside the agent's writable volumes
  • Attached to an email using the platform email service

The platform runs Chromium centrally in a pai-screenshot service so individual agent images stay lean. No per-agent configuration is required.

Quick start

apiVersion: pai.io/v1
kind: Agent
metadata:
name: visual-checker
spec:
models:
- anthropic/claude-sonnet-4-6
system: |
You are a visual QA agent. When given a URL, take a screenshot, then
describe what's on the page and flag anything that looks broken.

The screenshot tool is enabled by default — no tools: block needed.

Tool reference

ParameterTypeRequiredDescription
urlstringYesFull http(s):// URL to render
save_pathstringNoAbsolute path inside the container to save the PNG. Must point at a writable volume or /tmp
email_tostringNoWhen set, the PNG is attached to an email sent via the platform email service
email_subjectstringNoEmail subject (defaults to Screenshot of <url>)
email_bodystringNoPlain-text email body
return_imagebooleanNoInline the PNG in the tool result so the model can reason about it. Default true. Set false for save/email-only flows to save vision tokens
full_pagebooleanNoCapture the entire scrollable page rather than just the viewport. Default false
viewport_widthintegerNoBrowser viewport width in pixels
viewport_heightintegerNoBrowser viewport height in pixels
wait_forstringNoEither a load state (load, domcontentloaded, networkidle) or a CSS selector to wait for before capturing

Inline images are automatically downsampled to ≤1568 px on the long edge — the same size Anthropic resizes to internally — so vision tokens aren't wasted.

Example: visual verification

apiVersion: pai.io/v1
kind: Agent
metadata:
name: marketing-checker
spec:
type: task
agentDefinition: visual-checker
title: |
Take a screenshot of https://pairun.dev and confirm whether the hero
headline includes the word "Agentic". If it doesn't, send an email to
marketing@example.com flagging the issue with the screenshot attached.

The agent will:

  1. Call screenshot(url=…) — receives the PNG inline and reads it.
  2. Decide whether the headline is correct.
  3. If not, call screenshot(url=…, email_to="marketing@example.com", email_body=…) — the same tool sends the email with the PNG attached, all in one step.

Example: archival pipeline

spec:
type: task
agentDefinition: visual-checker
title: |
Capture full-page screenshots of these URLs and save each one to
/work/archive/<host>.png:
- https://pairun.dev
- https://docs.pairun.dev
volumes:
- name: work
mountPath: /work
size: 1Gi

The agent uses screenshot(url=…, save_path="/work/archive/pairun-dev.png", full_page=true, return_image=false) so the PNG is written without the model burning vision tokens it doesn't need.

Disable screenshot for an agent

tools:
- type: screenshot
enabled: false

Limits

LimitDefaultOverride (Helm)
Captures per workload per day200screenshot.dailyLimit
Default viewport1280 × 800screenshot.viewport.width / screenshot.viewport.height
Per-capture timeout30 sscreenshot.timeoutMs
Allowed hostsany public URLscreenshot.allowedHosts (comma-separated)

Deployment

The pai-screenshot service is disabled by default. Enable in your prod values:

# values-prod.yaml
screenshot:
enabled: true

It runs as a single replica of the upstream mcr.microsoft.com/playwright/python image, requesting 200m CPU / 512Mi memory (limits 1 CPU / 1Gi memory) — comfortable for a single Chromium. Bump replicas if you expect concurrent captures from many agents.

What it doesn't do (yet)

  • No PDF rendering or DOM dumps. Add later under the same service if needed.
  • No authenticated capture. The service renders public URLs only — no cookie injection or login. If you need to screenshot something behind auth, that's a separate Provider plumbing job.