Last verified: 2026-05-06 · Drift risk: medium Official sources: MCP Specification 2025-06-18, OpenAI Agents SDK
Multi-Agent Patterns¶
Multi-agent systems use more than one model instance to complete a task. Each agent typically has a specialized role, a tailored system prompt, and a defined interface for receiving inputs and returning outputs. The agents are coordinated by a pattern — supervisor, debate, planner/executor, or handoff — that determines who talks to whom and in what order.
Use multi-agent architecture when the task genuinely cannot be completed by a single-agent loop: when parallelism would reduce latency significantly, when the task requires role-switching incompatible with a single system prompt, when sub-task outputs need independent verification, or when the context required exceeds a single window.
Pattern 1: Supervisor + Workers¶
The most common multi-agent pattern. A supervisor agent breaks the task into sub-tasks and dispatches them to worker agents. Workers execute their assigned sub-task and return results. The supervisor aggregates results and decides whether to run additional workers or return the final answer.
+------------------+
| Supervisor |
| (planning agent) |
+--------+---------+
|
[dispatches sub-tasks]
|
+-----+------+-------+
| | |
+--+---+ +---+--+ +--+---+
|Worker| |Worker| |Worker|
| A | | B | | C |
+------+ +------+ +------+
| | |
+------+------+-------+
|
[results returned to supervisor]
|
+------+------+
| Supervisor |
| (synthesis) |
+-------------+
Implementation sketch:
def supervisor_loop(task: str, worker_agents: dict) -> str:
# Supervisor receives the task and decides which workers to invoke
plan = call_supervisor(task)
results = {}
for sub_task in plan.sub_tasks:
worker = worker_agents[sub_task.worker_type]
# Workers can run in parallel if they are independent
results[sub_task.id] = worker.run(sub_task.instructions)
# Supervisor synthesizes the results
return call_supervisor_synthesis(task, results)
Strengths: workers can be specialized (a "code writer" agent vs. a "code reviewer" agent vs. a "test writer" agent); independent sub-tasks can run in parallel; the supervisor can retry failed sub-tasks without re-running the whole pipeline.
Weaknesses: the supervisor itself can fail or produce a poor plan; each worker invocation adds latency and token cost; error propagation from worker to supervisor requires careful handling.
Pattern 2: Debate / Ensemble¶
Two or more agents independently solve the same problem. A judge agent (or a voting mechanism) selects the best answer or synthesizes across answers.
+----------+ +----------+
| Agent A | | Agent B |
| (draft 1)| | (draft 2)|
+----+-----+ +-----+----+
| |
+------+ +------+
| |
+----+--+----+
| Judge |
| (evaluator)|
+------------+
|
[final answer]
Use this pattern when: - The task has a verifiable correct answer (code that passes tests, a calculation that can be checked) - You want to reduce the variance of model outputs - The cost of a wrong answer is high enough to justify multiple model calls
Weaknesses: doubles or triples the API cost and latency; the judge can itself be wrong; for creative or open-ended tasks, "better" is subjective.
Pattern 3: Planner + Executor¶
The planner produces a complete plan before any execution begins. The executor follows the plan step by step, checking in with the planner only if it encounters a step it cannot complete.
+----------+
| Planner |
|(full plan|
| upfront) |
+----+-----+
|
[step list]
|
+----v-----+
| Executor |
|(runs each|
|step, logs|
| results)|
+----+-----+
|
[failure? check in]
|
+----v-----+
| Planner |
|(re-plans |
|from here)|
+----------+
This pattern works well when: - The task structure is predictable enough that a plan can be made upfront - The executor is reliable at following explicit instructions - Re-planning is cheaper than the cost of unplanned exploration
Weaknesses: a bad initial plan propagates through all execution steps; tasks that require information discovered during execution cannot be fully planned upfront; the executor's failures may not map cleanly back to planner-understandable descriptions.
Pattern 4: Handoffs (sequential pipeline)¶
Agents in sequence. Agent A produces an output that becomes the input to Agent B. Each agent adds value and passes a refined artifact downstream.
+----------+ +----------+ +----------+
| Agent A | --> | Agent B | --> | Agent C |
| (gather) | | (analyze)| | (format) |
+----------+ +----------+ +----------+
The OpenAI Agents SDK formalizes this pattern with explicit handoff definitions. An agent can declare: "when my task is complete, hand off to Agent B with this context." The SDK handles the message routing between agents.
Implementation using the OpenAI Agents SDK pattern:
# Pseudocode reflecting the Agents SDK handoff concept
from agents import Agent, handoff
researcher = Agent(
name="researcher",
instructions="Gather relevant data for the given topic. When done, hand off to the analyst.",
tools=[web_search_tool],
)
analyst = Agent(
name="analyst",
instructions="Analyze the data provided. Write a structured summary. Hand off to the formatter.",
)
formatter = Agent(
name="formatter",
instructions="Format the analyst's output as a professional report.",
)
# Define handoff chain
researcher.handoffs = [handoff(analyst)]
analyst.handoffs = [handoff(formatter)]
result = await researcher.run("Research the current state of fusion energy.")
Strengths: each agent has a clear, narrow responsibility; system prompts can be highly specialized; the pipeline is easy to reason about and test in isolation.
Weaknesses: errors at stage N affect all downstream stages; the pipeline is inherently sequential unless you introduce parallel branches; handoff context needs to be carefully formatted so the receiving agent has what it needs without the full upstream history.
Coordination: what to pass in a handoff¶
When one agent hands off to another, the receiving agent needs context without being overwhelmed by the full upstream conversation history. Common approaches:
Structured summary — the handing-off agent produces a machine-readable summary (JSON or a defined template) that the receiving agent consumes as input. This is the most reliable approach.
Excerpt — pass the last N messages plus a task description. Cheap to implement but risks dropping important earlier context.
Full transcript — pass everything. Simple but expensive and can cause context overflow on long pipelines.
External state — write intermediate results to a shared store (file, database, vector store). The receiving agent reads what it needs. See State, memory, and handoffs for this approach.
Parallelism¶
Independent sub-tasks in the supervisor pattern can run concurrently. In Python, use asyncio.gather:
import asyncio
async def run_parallel_workers(sub_tasks: list, workers: dict) -> list:
tasks = [
workers[st.worker_type].run_async(st.instructions)
for st in sub_tasks
]
return await asyncio.gather(*tasks)
Be aware of rate limits. Multiple parallel agents hitting the same API endpoint simultaneously may trigger throttling. Implement exponential backoff or use a token-bucket rate limiter in front of parallel agent invocations.
Error handling across agent boundaries¶
Each agent boundary is a potential failure point. Define an error contract for each agent:
- What does the agent return when it cannot complete the task?
- Does it return a partial result or raise an exception?
- How does the supervisor / downstream agent handle a failure?
A common pattern: agents return a result object with a success flag and an error_message field. The supervisor checks the flag before proceeding and can retry the sub-task, skip it, or abort the whole pipeline.
When not to use multi-agent¶
Multi-agent systems are not inherently better than single-agent systems. They are appropriate when: - Parallelism provides meaningful latency reduction - Specialization (different system prompts) produces meaningfully better outputs - The task genuinely exceeds a single context window
They add complexity, cost, and failure modes. If a well-crafted single-agent loop can solve the problem, prefer it.