NVIDIA NIM

NVIDIA NIM hosts open-weight models (Llama, GLM, Qwen, Mistral, DeepSeek, and more) behind an OpenAI-compatible API. Use provider: openai with a custom endpoint to point a ModelProvider at NIM.

Get an API key

Setup

NIM is OpenAI-compatible but lives at a custom URL. Use provider: openai with --endpoint and an --allowed-models list so Pai knows which models to expose (NIM's catalogue is large and changes frequently):

pai create model-provider nvidia \
  --provider openai \
  --endpoint https://integrate.api.nvidia.com/v1 \
  --allowed-models "z-ai/glm4.7,meta/llama-3.1-70b-instruct,qwen/qwen3-next-80b-a3b-instruct" \
  --api-key nvapi-...

Or as YAML:

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
  name: nvidia
spec:
  provider: openai
  endpoint: https://integrate.api.nvidia.com/v1
  apiKeySecretRef:
    name: nvidia-api-key
    key: api-key
  allowedModels:
    - z-ai/glm4.7
    - meta/llama-3.1-70b-instruct
    - qwen/qwen3-next-80b-a3b-instruct

Verify:

pai get model-providers
# NAME     PROVIDER   ENDPOINT                                MAX/DAY   LAST USED   AGE
# nvidia   openai     https://integrate.api.nvidia.com/v1               —           5s

Supported models

NIM's catalogue changes — browse the full list at build.nvidia.com/models. Common picks:

Example model	Reference
GLM 4.7	`nvidia/z-ai/glm4.7`
Llama 3.1 70B Instruct	`nvidia/meta/llama-3.1-70b-instruct`
Qwen3 Next 80B	`nvidia/qwen/qwen3-next-80b-a3b-instruct`
DeepSeek V3	`nvidia/deepseek-ai/deepseek-v3`
Mistral Large	`nvidia/mistralai/mistral-large-instruct`

Why allowedModels matters here

For native providers (anthropic, gemini, openai, openrouter) Pai knows the catalogue and validates model names automatically. For OpenAI-compatible endpoints Pai can't auto-discover what's available — allowedModels is the source of truth. Add every model your agents should be able to use.

Use in an agent

spec:
  models:
    - nvidia/z-ai/glm4.7

Multiple models — first is primary, rest are fallbacks:

spec:
  models:
    - nvidia/z-ai/glm4.7                 # primary
    - nvidia/meta/llama-3.1-70b-instruct # fallback

Mix with other providers so an agent can fall back across subscriptions:

spec:
  models:
    - anthropic/claude-sonnet-4-6        # primary
    - nvidia/z-ai/glm4.7                 # fallback when Anthropic is over budget

Token budgets

Cap how many tokens this subscription burns per day across every agent — a hard safety net against runaway spend.

apiVersion: pai.io/v1
kind: ModelProvider
metadata:
  name: nvidia
spec:
  provider: openai
  endpoint: https://integrate.api.nvidia.com/v1
  apiKeySecretRef:
    name: nvidia-api-key
    key: api-key
  maxTokensPerDay: 5000000      # daily cap shared across all agents
  maxTokensPerRequest: 128000   # per-request context-window limit
  allowedModels:
    - z-ai/glm4.7
    - meta/llama-3.1-70b-instruct

When the daily cap is hit, the gateway returns HTTP 429 until midnight UTC. Agents that list another provider in spec.models automatically fall over to it.

Expose via the LLM Gateway

Set externalAccess.enabled: true to let developers outside the cluster — laptops, CI, scripts — route their own LLM traffic through this provider. The NVIDIA API key stays inside Pai; clients authenticate with a Pai AccessKey instead.

spec:
  externalAccess:
    enabled: true
    maxTokensPerDay: 2000000    # separate budget for external usage

Once enabled, developers connect with three commands:

pai login https://api.pairun.dev --access-key pak_...
eval $(pai gateway env)
# OpenAI-compatible clients now reach NIM models through Pai

See LLM Gateway for the full onboarding flow, AccessKey management, and per-developer rate limits.

Access control

Narrow which models agents may call on this provider with allowedModels / deniedModels, or attach prompt-injection guards. See Security controls on the Model page for the full field list.

Get an API key​

Setup​

Supported models​

Use in an agent​

Token budgets​

Expose via the LLM Gateway​

Access control​