Last verified: 2026-05-06 · Drift risk: low

Agent Factory — section overview¶

The Agent Factory is a repeatable, human-gated workflow for taking an agent idea all the way from a vague request to a production-ready artifact. It does not automate deployment. Every artifact the factory produces is a candidate: inert until a human owner reviews it, approves it, and deliberately activates it inside a system that is already governed.

The factory exists because agent-building without a shared process produces waste in predictable ways: teams spec the same agent twice, nobody owns maintenance, safety review happens at the last moment (or not at all), and agents drift quietly after launch until something breaks. The factory addresses each of those failure modes with a corresponding stage.

What the factory is — and is not¶

The factory is a process, not a platform. It runs on whichever tools your team already uses for documentation, code review, and quality assurance. The only hard requirements are that every candidate passes a minimum eval and red-team threshold before a human approves it, and that every active agent has a named owner who has agreed to a maintenance cadence.

The factory is not an autonomous pipeline that ships agents without review. It is not a registry of live agents. It is not a substitute for your organization's existing change-management or security processes. It adds a structured on-ramp in front of those processes.

The nine lifecycle stages¶

Stage	One-line description
Intake	Capture what the requester actually needs and what "failure" means to them
Rank	Score candidate agents on value, feasibility, safety, maintenance burden, and platform fit
Spec	Write a tight job statement, define inputs/outputs/tools, and establish stop conditions
Build	Generate system prompts, n-shot exemplars, refusal prompts, and error-recovery scaffolding
Eval	Produce at least 20 golden test cases and run them against the candidate
Red-team	Produce at least 20 adversarial cases and confirm the agent handles all critical failures
Port	Translate the approved prompt pack to every platform your team needs to support
Launch	Verify all gate criteria are met; the owner signs the launch-readiness checklist
Maintain	Re-evaluate on a cadence; retire agents that no longer meet the standard

Sub-pages in this section¶

Page	What it covers
Factory operating model	Roles, lifecycle stages, and decision rights in RACI form
Agent portfolio design	How to manage a collection of agents as a portfolio, including retirement
Requirements intake	A short intake form with a worked example
Candidate ranking	The 5-factor rubric and a worked ranking of 5 examples
Prompt-pack generation	From a spec to a complete prompt pack, with a worked example
Eval generation	Producing 20 golden cases from a spec or workflow logs
Red-team generation	Producing 20 adversarial cases and reviewing them with a red-team agent
Cross-platform porting	Translating one agent across Claude, OpenAI, Gemini, and Copilot
Launch readiness	The gate criteria with concrete numerical thresholds
Maintenance and drift	Keeping active agents healthy and retiring them when the time comes
Worked example	A complete factory run from 25 candidates to 5 launch-ready agents

How this section connects to the rest of the guide¶

The factory stages reference three other sections heavily.

Recipes provide ready-made prompt packs for common agent types. When the Build stage produces a prompt pack, the recipes section is the first place to look for a starting point rather than writing from scratch.

Evals provide ready-to-use golden and adversarial test cases organized by agent category. The Eval and Red-team stages draw directly from those libraries rather than generating every case from scratch.

Template library holds the canonical spec template, intake form template, launch-readiness checklist, and maintenance log format. Every factory artifact should start from the template-library version so that outputs are consistent across teams.

How to use this section¶

If you are new to the factory, read Factory operating model first to understand who does what at each stage. Then read Agent portfolio design to understand why the factory gates matter even for small portfolios. After that, work through the remaining pages in order the first time you run a factory cycle. On subsequent runs, jump directly to the stage you are at.

If you want to see everything working together before you read the individual pages, start with Worked example. It covers a full research-workflow portfolio from candidate generation through the approval gate, and references the relevant sub-pages at each step.

The non-negotiable: human approval before activation¶

Every output of the factory — prompts, evals, red-team suites, porting tables, launch checklists — is a candidate artifact. No agent produced by this workflow is active until a human owner explicitly approves it and records that approval. This constraint is not a formality. Agents that run without an identified owner have no one accountable for their outputs, no one to call when they regress, and no one to retire them when the underlying platform changes. The factory's value comes entirely from the combination of structured process and deliberate human sign-off.