By the end of this tutorial, you’ll have a working three-agent pipeline — research, write, edit — orchestrated by a supervisor that delegates tasks, collects results, and merges them into a finished output. You’ll also have real cost numbers and know exactly when multi-agent Claude orchestration is worth the added complexity versus a single well-prompted call.
- Set up the project — install the Anthropic SDK and define your agent scaffolding
- Build the base agent class — a reusable wrapper with role, memory, and call logic
- Implement the three specialist agents — Research, Writer, and Editor with distinct system prompts
- Build the Supervisor — orchestration logic that delegates and sequences tasks
- Wire the pipeline together — end-to-end run with result merging
- Profile cost and latency — real numbers from test runs
Why Multi-Agent At All?
The honest answer: most tasks don’t need it. A single Claude Sonnet call with a thorough system prompt handles the majority of content, analysis, and summarization work. Where multi-agent pipelines earn their keep is when you have genuinely separable concerns that benefit from independent context windows — where giving one agent the full history of another’s work would dilute its focus rather than help it.
The research → write → edit chain is a textbook example. A researcher digging through sources doesn’t need to know the target word count or tone guidelines. A writer doesn’t need raw URLs. An editor doesn’t need either — it just needs clean copy and a rubric. Splitting these tasks also lets you swap models per role: cheap Haiku for research scraping, Sonnet for writing, Haiku again for line-editing. That model-mixing is where you recover the cost premium of running three calls instead of one.
If you’re newer to structuring agent behavior, the patterns in our role prompting best practices guide apply directly here — each specialist agent needs a tight, unambiguous identity in its system prompt.
Step 1: Set Up the Project
# Python 3.11+, pin your versions
pip install anthropic==0.28.0 python-dotenv==1.0.1
# config.py
import os
from dotenv import load_dotenv
load_dotenv()
ANTHROPIC_API_KEY = os.environ["ANTHROPIC_API_KEY"]
# Model choices per agent role — cost-optimised
MODELS = {
"research": "claude-haiku-4-5", # ~$0.00025 / 1K input tokens
"writer": "claude-sonnet-4-5", # ~$0.003 / 1K input tokens
"editor": "claude-haiku-4-5",
"supervisor": "claude-haiku-4-5",
}
Using Haiku for research and editing shaves significant cost. The writer gets Sonnet because prose quality is the output users see — that’s where the extra spend is justified.
Step 2: Build the Base Agent Class
# agent.py
import anthropic
from config import ANTHROPIC_API_KEY, MODELS
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
class Agent:
def __init__(self, role: str, system_prompt: str):
self.role = role
self.model = MODELS[role]
self.system_prompt = system_prompt
self.history: list[dict] = [] # per-agent message history
def run(self, user_message: str, max_tokens: int = 1024) -> str:
self.history.append({"role": "user", "content": user_message})
response = client.messages.create(
model=self.model,
max_tokens=max_tokens,
system=self.system_prompt,
messages=self.history,
)
reply = response.content[0].text
self.history.append({"role": "assistant", "content": reply})
return reply
def reset(self):
"""Clear history between pipeline runs."""
self.history = []
Keeping history per agent rather than sharing one global context is intentional. The writer shouldn’t hallucinate source URLs it never saw; the editor shouldn’t second-guess research that was already validated. Isolated context windows reduce cross-contamination. If you need agents that remember state across sessions, the persistent memory architecture guide covers how to bolt that on without bloating per-run context.
Step 3: Implement the Three Specialist Agents
# agents/specialist.py
from agent import Agent
research_agent = Agent(
role="research",
system_prompt="""You are a research specialist. Given a topic, produce:
1. 3-5 key facts with source descriptions (no live URLs needed, describe the source type)
2. One contrarian or nuanced point the mainstream view misses
3. A one-sentence summary of the core finding
Output as structured text with clear section labels. Be concise."""
)
writer_agent = Agent(
role="writer",
system_prompt="""You are a technical content writer. You receive a research brief and
produce a draft article section of 300-400 words.
Rules:
- No filler phrases ("it's worth noting", "in today's world")
- Cite sources inline as [Source: description]
- Use short paragraphs (3 sentences max)
- End with a clear takeaway sentence
Output the draft only — no meta-commentary."""
)
editor_agent = Agent(
role="editor",
system_prompt="""You are a senior editor. You receive a draft and return an edited version.
Your job:
- Cut sentences that don't add information
- Fix passive voice where active is clearer
- Flag (don't fix) any factual claims that seem unverified with [CHECK]
- Preserve the author's voice — don't rewrite for style, only for clarity
Return the edited draft only."""
)
Step 4: Build the Supervisor
The supervisor is the orchestration layer. It doesn’t do the work — it sequences it, passes outputs as inputs, and handles failures. This is the pattern that makes multi-agent Claude orchestration composable at scale.
# supervisor.py
import time
from agents.specialist import research_agent, writer_agent, editor_agent
class ContentSupervisor:
def __init__(self):
self.agents = {
"research": research_agent,
"writer": writer_agent,
"editor": editor_agent,
}
self.run_log: list[dict] = []
def _delegate(self, agent_name: str, task: str, max_tokens: int = 1024) -> dict:
agent = self.agents[agent_name]
start = time.time()
try:
result = agent.run(task, max_tokens=max_tokens)
latency = round(time.time() - start, 2)
self.run_log.append({
"agent": agent_name,
"model": agent.model,
"latency_s": latency,
"input_len": len(task),
"output_len": len(result),
"status": "ok",
})
return {"status": "ok", "result": result}
except Exception as e:
self.run_log.append({"agent": agent_name, "status": "error", "error": str(e)})
# Fail fast — don't pass garbage to the next agent
raise RuntimeError(f"Agent '{agent_name}' failed: {e}") from e
def run_pipeline(self, topic: str) -> dict:
"""Full research → write → edit pipeline for a topic."""
# Reset agent histories for a clean run
for agent in self.agents.values():
agent.reset()
# Step 1: Research
print(f"[Supervisor] Delegating research on: {topic}")
research_out = self._delegate(
"research",
f"Research this topic thoroughly: {topic}",
max_tokens=800,
)
# Step 2: Write — pass research brief as context
print("[Supervisor] Delegating writing task")
write_prompt = f"""Write a section based on this research brief:
{research_out['result']}
Topic: {topic}"""
write_out = self._delegate("writer", write_prompt, max_tokens=600)
# Step 3: Edit — pass draft only
print("[Supervisor] Delegating editing task")
edit_prompt = f"Edit this draft:\n\n{write_out['result']}"
edit_out = self._delegate("editor", edit_prompt, max_tokens=600)
return {
"topic": topic,
"research": research_out["result"],
"draft": write_out["result"],
"final": edit_out["result"],
"run_log": self.run_log,
}
Step 5: Wire the Pipeline Together
# main.py
from supervisor import ContentSupervisor
import json
supervisor = ContentSupervisor()
result = supervisor.run_pipeline(
topic="Why vector databases outperform traditional SQL for semantic search at scale"
)
print("\n=== FINAL OUTPUT ===")
print(result["final"])
print("\n=== RUN LOG ===")
for entry in result["run_log"]:
print(json.dumps(entry, indent=2))
On a representative run of this topic, the pipeline produces ~350 words of edited copy. The output is noticeably cleaner than a single-prompt equivalent because the editor has no attachment to the draft — it cuts ruthlessly.
Step 6: Cost and Latency Analysis
Here’s what a single pipeline run costs at current API pricing (tested July 2025):
| Agent | Model | Latency | Est. Cost |
|---|---|---|---|
| Research | Haiku | 1.8s | ~$0.0003 |
| Writer | Sonnet | 4.2s | ~$0.0041 |
| Editor | Haiku | 1.6s | ~$0.0004 |
| Total | Mixed | ~7.6s | ~$0.0048 |
A single Sonnet call for the same task runs in ~3.5s and costs ~$0.003. You’re paying roughly 60% more and waiting 2x longer for the multi-agent version. The quality delta is real but not always worth it — especially for shorter outputs. For 1,000-word+ pieces or when research fidelity matters, the pipeline pays back. For quick summaries under 200 words, use one call.
If you’re running this at volume — say, 500+ articles a month — the cost profile changes and you’ll want to look at batch processing with the Claude API to cut costs by up to 50% on the non-latency-sensitive steps.
Common Errors
1. Context bleed — agents referencing things they shouldn’t know
Symptom: the editor starts citing sources it never received. Cause: you’re sharing one history list across agents or accidentally injecting the full pipeline context into a later prompt. Fix: the reset() method on each agent at pipeline start, and only pass the immediate predecessor’s output — not the full chain.
2. Silent failures — the pipeline completes but with empty or truncated output
Symptom: result["final"] is two sentences. Cause: max_tokens too low for the model/task combination, or the agent’s system prompt is fighting the user prompt (both demanding different output lengths). Fix: log response.stop_reason on every call — if it’s "max_tokens", you’re truncating. Bump the limit and check for contradictory instructions in your prompts. This is related to the structured output verification patterns worth implementing at each stage.
3. Downstream agent quality degrading on bad upstream output
Symptom: final output is decent but the draft had obvious gaps the editor couldn’t recover. Cause: the research agent produced vague, low-information output and the writer built on sand. Fix: add a lightweight validation step after research — check minimum fact count, section label presence — before delegating to the writer. Fail fast with a retry rather than letting garbage propagate. See our guide on LLM fallback and retry logic for patterns that handle this gracefully.
Multi-Agent vs Single Agent: The Real Decision
Use multi-agent when:
- Tasks have genuinely different expertise requirements (research vs prose vs critique)
- You want to mix models for cost optimization
- Output quality improves measurably from agent separation (test this — don’t assume)
- You need to parallelize subtasks (modify the supervisor to run research calls concurrently with
asyncio)
Stick with single-agent when:
- The task fits in one well-structured prompt
- Latency matters and you can’t parallelize
- You’re prototyping — add agents later when you’ve identified the bottlenecks
What to Build Next
Add a fact-checker agent between writer and editor. Give it web search tool access (via Claude’s tool use API) and have it return a JSON array of [claim, verified: bool, source] objects. The supervisor can then inject only unverified claims back to the writer for revision before the editor ever sees the draft. This closes the hallucination loop that plagues most content pipelines and gives you an audit trail per article — which matters a lot if you’re building anything in a regulated space or at the scale covered in our SEO content audit automation guide.
Bottom Line: Who Should Use This Pattern
Solo founders building content workflows: the research → write → edit pipeline is immediately deployable. Start with all three on Haiku to test quality, then upgrade only the writer to Sonnet once you’ve validated the structure works for your use case.
Teams building production pipelines: instrument every agent call with observability tooling before going live. You need per-agent latency, token counts, and failure rates to tune the pipeline intelligently — flying blind on a three-agent chain is how you end up with unpredictable costs.
Budget-conscious builders: multi-agent Claude orchestration only makes financial sense when you’re mixing model tiers or parallelizing. If you’re running all agents on Sonnet sequentially, a single Sonnet call with a structured prompt will beat you on cost and latency most of the time. Run the comparison on your specific task before committing to the architecture.
Frequently Asked Questions
How do I run multiple Claude agents in parallel instead of sequentially?
Replace the sequential _delegate calls in the supervisor with asyncio.gather() — convert agent.run() to an async method using client.messages.create via the async Anthropic client (anthropic.AsyncAnthropic). Parallelization only makes sense for independent tasks — research sub-topics are a good candidate; editing a draft that hasn’t been written yet is not.
What’s the difference between a supervisor pattern and a pipeline pattern?
A pipeline is linear — output of agent A feeds directly into agent B. A supervisor pattern has a coordinating agent that decides which agents to call, in what order, and whether to retry or reroute based on output quality. The code in this tutorial uses a fixed-sequence supervisor — the next level is a dynamic supervisor that checks output quality gates before proceeding.
Can I use different Claude models for different agents in the same pipeline?
Yes — the Anthropic API treats each call independently, so you can mix claude-haiku-4-5, claude-sonnet-4-5, and claude-opus-4-5 freely within one pipeline. The MODELS dict in this tutorial’s config.py is exactly how to manage that. The main gotcha is that Haiku and Sonnet have different instruction-following reliability on complex tasks — test your specialist prompts on the intended model, not a more capable one.
How do I prevent agents from contradicting each other in the final output?
The main cause is context isolation combined with inconsistent facts in the research output. The fix is a schema: have the research agent output structured JSON (claims, sources, key terms) and pass that schema as a constraint to the writer. The editor receives the same schema as a reference for flagging inconsistencies. Structured outputs reduce drift significantly — more on this in our hallucination reduction guide.
How much does a full three-agent pipeline cost at scale?
Using the mixed Haiku/Sonnet setup in this tutorial: roughly $0.005 per pipeline run. At 1,000 runs/month that’s $5 — negligible. At 100,000 runs/month it’s $500, at which point batch API and prompt caching become worth engineering for. The cost scales linearly unless you introduce caching, so model selection at design time matters more than micro-optimizing prompts.
Put this into practice
Try the Connection Agent agent — ready to use, no setup required.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

