Single agents break on complex tasks. Not because Claude isn’t capable, but because you’re asking one context window to hold too much state, make too many decisions, and execute too many steps without error. The solution isn’t a bigger prompt — it’s Claude multi-agent systems that distribute work the way a well-run team does: an orchestrator that plans and delegates, specialists that execute, and coordination mechanisms that keep everything coherent.
This article walks through hierarchical orchestration in practice. You’ll build a system where a top-level Claude agent breaks down a complex request, spawns specialized subagents to handle subtasks in parallel, and synthesizes the results. We’ll cover the architecture, the code, and the failure modes that documentation doesn’t warn you about.
Why Hierarchical Agents Beat Single-Agent Loops
The temptation is to build one agent with a long system prompt and a big tool list. It works fine for simple tasks. For anything non-trivial — say, researching a market, writing a technical report, or running a multi-step data pipeline — a single-agent loop degrades fast. You get context pollution, where earlier tool outputs crowd out relevant instructions. You get compounding errors, where a wrong intermediate result propagates silently. And you get bottlenecks, because everything runs sequentially.
Hierarchical orchestration solves all three. The orchestrator maintains a clean, high-level view. Subagents work in isolated contexts, so their mistakes don’t contaminate each other. And when subagents run in parallel, the wall-clock time drops dramatically.
The tradeoff is real: you’re adding coordination overhead, more API calls, and more moving parts to debug. For a task that takes three sequential steps, a single agent is almost certainly the right call. For anything involving more than five distinct subtasks, different skill domains, or parallel workstreams, the hierarchy pays for itself.
The Architecture: Orchestrator + Specialist Subagents
Here’s the model I use in production. An orchestrator agent receives the top-level goal. It doesn’t execute anything directly — it plans, decomposes, delegates, and synthesizes. Subagents each own a narrow capability: web research, code writing, data analysis, document formatting. The orchestrator dispatches tasks to subagents, waits for results (either sequentially or asynchronously), and assembles the final output.
Communication between layers happens through structured messages — typically JSON. The orchestrator passes a task spec to a subagent; the subagent returns a result spec. This keeps interfaces clean and makes the whole system inspectable and debuggable.
Defining the Orchestrator
The orchestrator’s system prompt is where most of the design work lives. It needs to know: what subagents are available, how to describe tasks for them, and when to use each one. Keep it minimal and explicit.
import anthropic
import json
from typing import Any
client = anthropic.Anthropic()
ORCHESTRATOR_SYSTEM = """
You are a planning orchestrator. Your job is to decompose complex tasks into subtasks
and delegate them to specialist agents. You do not execute tasks yourself.
Available specialists:
- researcher: finds information, summarizes sources, answers factual questions
- analyst: processes data, identifies patterns, produces structured insights
- writer: drafts documents, reports, summaries from provided content
When delegating, output a JSON array of task objects:
[
{"agent": "researcher", "task": "...", "depends_on": []},
{"agent": "analyst", "task": "...", "depends_on": [0]},
{"agent": "writer", "task": "...", "depends_on": [0, 1]}
]
depends_on lists the indices of tasks that must complete before this one starts.
After all results are returned, synthesize them into a final response.
"""
def orchestrate(goal: str) -> str:
# Step 1: Ask orchestrator to plan
plan_response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system=ORCHESTRATOR_SYSTEM,
messages=[{"role": "user", "content": f"Goal: {goal}\nOutput a delegation plan as JSON."}]
)
plan_text = plan_response.content[0].text
# Extract JSON plan — real implementations should add error handling here
start = plan_text.find("[")
end = plan_text.rfind("]") + 1
tasks = json.loads(plan_text[start:end])
return tasks
Note the depends_on field — this is how you encode the dependency graph. Tasks with empty dependency lists can run in parallel. Tasks that depend on index 0 wait for index 0 to complete. It’s simple and it works.
Implementing Specialist Subagents
Each subagent is just a Claude call with a focused system prompt and, optionally, tools. The key discipline: give each subagent exactly the context it needs and no more. Don’t pass the full conversation history — pass the task spec and any direct inputs it needs.
SUBAGENT_SYSTEMS = {
"researcher": """
You are a research specialist. Given a research task, provide a thorough,
factual response with clear sourcing notes. Be concise — your output feeds
into downstream analysis.
""",
"analyst": """
You are a data and content analyst. You receive research output and identify
key patterns, gaps, and structured insights. Output JSON where possible.
""",
"writer": """
You are a professional writer. You receive research and analysis, then produce
polished, structured documents. Match the format requested in the task.
"""
}
def run_subagent(agent_type: str, task: str, context: dict[str, Any] = {}) -> str:
"""
Runs a specialist subagent with the given task.
context contains results from dependency tasks (keyed by index).
"""
user_message = f"Task: {task}"
if context:
context_str = "\n\n".join(
f"Input from task {idx}:\n{result}"
for idx, result in context.items()
)
user_message += f"\n\nContext from prior tasks:\n{context_str}"
response = client.messages.create(
model="claude-haiku-4-5", # Cheaper model for leaf-node work
max_tokens=2048,
system=SUBAGENT_SYSTEMS[agent_type],
messages=[{"role": "user", "content": user_message}]
)
return response.content[0].text
Notice I’m using claude-haiku-4-5 for the subagents and claude-opus-4-5 for the orchestrator. The orchestrator needs strong reasoning to decompose problems and synthesize results — that’s worth the cost. Subagents are doing more bounded, execution-focused work where Haiku performs well and costs roughly $0.00025 per 1K input tokens vs Opus at $0.015. On a five-task pipeline, this model routing cuts costs by 80%+ compared to running everything on Opus.
Running the Dependency Graph: Sequential and Parallel Execution
Once you have the task graph, you need an executor that respects dependencies and runs independent tasks concurrently. Python’s concurrent.futures is the pragmatic choice here — async/await works too but adds complexity without much benefit at this scale.
from concurrent.futures import ThreadPoolExecutor, as_completed
def execute_plan(tasks: list[dict]) -> dict[int, str]:
results = {}
completed = set()
# Simple topological execution — not optimized, but correct
while len(completed) < len(tasks):
# Find tasks whose dependencies are all satisfied
ready = [
i for i, task in enumerate(tasks)
if i not in completed
and all(dep in completed for dep in task.get("depends_on", []))
]
if not ready:
# Circular dependency or bug in the plan
raise ValueError("No runnable tasks — check dependency graph")
# Run ready tasks in parallel
with ThreadPoolExecutor(max_workers=len(ready)) as executor:
futures = {
executor.submit(
run_subagent,
tasks[i]["agent"],
tasks[i]["task"],
{dep: results[dep] for dep in tasks[i].get("depends_on", [])}
): i
for i in ready
}
for future in as_completed(futures):
idx = futures[future]
results[idx] = future.result()
completed.add(idx)
return results
This executor finds tasks with satisfied dependencies, runs them in parallel, then repeats. It’s not the most elegant topological sort, but it’s easy to debug and handles most real-world graphs correctly.
Synthesis: Bringing Results Back to the Orchestrator
After all subagents complete, route their outputs back to the orchestrator for synthesis. This is where the orchestrator earns its cost — it needs to reconcile potentially conflicting outputs, identify gaps, and produce a coherent final response.
def synthesize(goal: str, task_plan: list[dict], results: dict[int, str]) -> str:
results_text = "\n\n".join(
f"--- Task {i} ({task_plan[i]['agent']}): {task_plan[i]['task']} ---\n{result}"
for i, result in results.items()
)
synthesis_response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
system=ORCHESTRATOR_SYSTEM,
messages=[
{"role": "user", "content": f"Original goal: {goal}\nOutput a delegation plan as JSON."},
{"role": "assistant", "content": f"[Delegation plan was executed]"},
{"role": "user", "content": f"All tasks complete. Here are the results:\n\n{results_text}\n\nNow synthesize a final response to the original goal."}
]
)
return synthesis_response.content[0].text
# Putting it all together
def run_multi_agent(goal: str) -> str:
print(f"Planning for: {goal}")
task_plan = orchestrate(goal)
print(f"Generated {len(task_plan)} tasks")
results = execute_plan(task_plan)
print(f"All tasks complete")
return synthesize(goal, task_plan, results)
Failure Modes You’ll Actually Hit
The architecture looks clean on a diagram. Here’s what breaks in practice:
The Orchestrator Writes Bad JSON
Even with a tight system prompt, Claude will occasionally produce malformed plans — especially for ambiguous or unusual goals. Always validate the plan before executing it. Check that depends_on indices are valid, that agent types exist, and that you don’t have circular dependencies. Log the raw plan text so you can diagnose failures.
Subagent Context Gets Too Long
If dependency chains are deep, a subagent at the end of the chain receives a massive context block. This causes slower responses, higher costs, and sometimes degraded quality. Summarize intermediate results before passing them downstream. Even a one-sentence summary at the end of each subagent’s output, prefixed with “SUMMARY:”, gives you a cheap way to compress context in the executor.
Parallelism Hits Rate Limits
Spawning ten subagents simultaneously will immediately hit Anthropic’s rate limits if you’re on a tier-1 account. Add a semaphore to the executor to cap concurrent calls. Start with 3-5 concurrent workers and increase based on your actual tier limits. The Anthropic dashboard shows your rate limit tier under usage settings.
The Orchestrator Hallucinates Capabilities
If your system prompt lists available agents, Claude will occasionally invent new agent types that don’t exist (“database_query_agent”). Validate agent types in the plan against your registered specialists and either reject invalid plans or reroute them to the closest valid agent.
When to Use This Pattern
This architecture earns its complexity when:
- You have parallel workstreams — tasks that can run independently and then merge
- Different subtasks need different capabilities — web access, code execution, structured data output
- Subtask quality benefits from focused context — a writer subagent shouldn’t be wading through raw research data
- You need auditability — the per-task result log makes debugging and QA far easier than a single-agent trace
Skip it when the task is linear, when you have fewer than four steps, or when you’re still prototyping. A well-structured single-agent loop is faster to build and easier to maintain at small scale.
Bottom Line for Different Builder Types
Solo founders building internal tools: Start with the orchestrator + two subagents pattern. Keep the subagent roster small and add specialists only when you have a concrete bottleneck. Your time cost of debugging a multi-agent system is real — don’t over-engineer early.
Teams building production AI products: The hierarchical pattern scales well. Invest in a proper task queue (Redis + Celery, or a job service) instead of the in-process ThreadPoolExecutor shown here. Add structured logging per task — you’ll need it for debugging and for cost attribution by task type.
Cost-sensitive builders: Model routing is your biggest lever. Run your orchestrator on Opus or Sonnet, run specialists on Haiku. Profile which agent types consume the most tokens — writers are usually the biggest spenders — and tune max_tokens per agent type. A five-agent pipeline with smart model routing should run well under $0.05 per complex task at current pricing.
Claude multi-agent systems aren’t magic — they’re software architecture applied to LLM calls. The same principles that make microservices maintainable (clear interfaces, single responsibility, explicit dependencies) make multi-agent systems workable. Get those right and the complexity becomes manageable.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

