Safety checklists¶

Last verified: 2026-05-06 · Drift risk: low

Use these checklists at the stages where they appear. They are designed to be actionable: each item is a concrete thing you can verify or do, not a vague principle. Adapt them to your context — add items specific to your domain, remove items that do not apply.

For a copy-paste version suitable for embedding in a PR template or runbook, see the safety checklist template.

Pre-build checklist¶

Complete this before writing a line of agent code. Its purpose is to surface scope and risk issues while they are still cheap to resolve.

Pre-deploy checklist¶

Complete this before moving any version of the agent into a production or user-facing environment.

Scope and prompt¶

System prompt includes an explicit description of what the agent will and will not do.
System prompt includes explicit refusal behavior: what the agent says when asked to do something outside scope.
System prompt names the tools the agent may use (explicit allowlist, not just "use tools as needed").
System prompt names any tools the agent may NOT use.
Tested that the agent refuses requests outside its defined scope.

Tool configuration¶

Every tool is configured with least-privilege access (read-only where write is not needed, scoped API keys, no admin credentials).
Tool argument validation is in place server-side, not just in the prompt.
Outbound URL calls are restricted to an allowlist of trusted hosts.
There is a hard cap on the number of tool calls per agent session.

Evals¶

A golden eval set exists with at least 5 cases covering core tasks.
Adversarial eval cases cover prompt injection, jailbreak attempts, and out-of-scope requests.
All eval cases pass on the version being deployed.
Evals are committed to version control and tagged to match the deployment version.

Human-in-the-loop¶

Every irreversible action has a HITL confirmation gate in place.
The confirmation gate presents the user with a plain-language description of what will happen.
A dry-run mode exists for operations that write to external systems.

Auth and secrets¶

All API keys and secrets are stored in a secrets manager, not in the prompt, code, or environment files committed to version control.
API keys are scoped to the minimum required permissions.
There is a documented process for rotating keys.
The agent cannot read or output its own API keys.

Logging¶

All tool calls are logged with full argument values and response payloads.
Logs are retained for at least 30 days (or longer if your compliance requirements demand it).
Logs are stored in a system accessible to the security team, not only to the agent operator.
PII in logs is redacted or access-controlled.

Post-deploy checklist¶

Complete this within the first two weeks of a production deployment, and revisit on a regular cadence (monthly is reasonable for most agents).

Evals are running automatically on every deployment (CI/CD integration).
Production logs are being reviewed at least weekly for anomalies.
A cost alert is configured to fire if token or API call spend exceeds 2x the expected baseline.
A user feedback channel exists and is monitored.
At least one person (not the original builder) has reviewed the agent's behavior in production.
A red-team session has been completed since the last major change to the prompt, tools, or model version.
The agent spec is up to date and matches the deployed configuration.
Owner contact information is documented and reachable.

Incident checklist¶

Use this if you suspect or confirm that the agent has behaved in an unintended way. See also Incident response for a fuller workflow.