If you’ve spent any time building agents with Claude, you’ve probably run into the terminology confusion: tool use, function calling, tool definitions — are these the same thing? Different things? The Anthropic docs use them somewhat interchangeably, which doesn’t help. Getting this straight matters because Claude tool use function calling patterns have real architectural implications for how you structure your agents, handle errors, and control costs. This article cuts through that confusion with working code and honest tradeoffs.
What Anthropic Actually Means by “Tool Use”
Claude doesn’t have “function calling” in the way OpenAI frames it. Anthropic’s term is tool use, and it describes the entire mechanism by which Claude can request the execution of external operations during a conversation turn. When you see developers say “function calling” in the context of Claude, they almost always mean the same thing — it’s a legacy term borrowed from the OpenAI ecosystem.
Here’s the core mechanic: you define a set of tools in your API request. Claude decides whether to use one. If it does, the API response contains a tool_use content block instead of (or alongside) a text block. Your application executes the tool, returns the result, and Claude continues. It’s a multi-turn handshake, not a one-shot call.
The distinction worth preserving isn’t “tool use vs function calling” — it’s the difference between two implementation patterns within Claude’s tool use system:
- Single-tool, single-turn: Claude picks one tool, you run it, the conversation completes.
- Multi-tool, agentic loop: Claude chains tool calls across multiple turns to accomplish a complex goal.
These look similar on the surface but behave very differently in production, and the pattern you choose shapes everything from latency to cost to failure handling.
How Tool Definitions Work Under the Hood
Before comparing patterns, you need to understand what a tool definition actually is. It’s a JSON schema passed in the tools array of your API request. Claude reads this schema the same way it reads your system prompt — it’s part of the context window and contributes to your token count.
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_stock_price",
"description": "Retrieves the current stock price for a given ticker symbol. Use this when the user asks about stock prices or market data.",
"input_schema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "The stock ticker symbol, e.g. AAPL, TSLA"
},
"currency": {
"type": "string",
"enum": ["USD", "EUR", "GBP"],
"description": "The currency for the price. Defaults to USD."
}
},
"required": ["ticker"]
}
}
]
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's Apple's stock price right now?"}
]
)
# Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
tool_call = next(b for b in response.content if b.type == "tool_use")
print(f"Tool: {tool_call.name}")
print(f"Input: {tool_call.input}") # {'ticker': 'AAPL'}
A few things to notice here. The description field on each tool is not boilerplate — it’s the primary signal Claude uses to decide whether to invoke the tool. Vague descriptions cause misuse or missed opportunities. The input_schema enforces structure on what Claude can send you, which is valuable for input validation downstream.
Token Cost of Tool Definitions
Every tool definition you include is burned as input tokens. A single moderately complex tool definition typically runs 100–250 tokens. If you’re building an agent with 20 tools, you’re adding 2,000–5,000 tokens to every single request before your user has typed a word. At current Claude Haiku 3.5 pricing (~$0.80/M input tokens), that’s roughly $0.004 per request just for tool context — negligible per call, but it compounds fast at scale. On Sonnet or Opus, multiply by 3–15x. This is why tool selection strategy matters.
Pattern 1: Structured Extraction (The Underrated Use Case)
Most developers think of tool use as “letting Claude call APIs.” But one of the most reliable production patterns is using a tool definition purely to force structured output — no external API call ever happens.
import anthropic
import json
client = anthropic.Anthropic()
# Define a "tool" that's really just a structured output schema
extract_tool = {
"name": "extract_invoice_data",
"description": "Extract structured data from an invoice document.",
"input_schema": {
"type": "object",
"properties": {
"vendor_name": {"type": "string"},
"invoice_number": {"type": "string"},
"total_amount": {"type": "number"},
"currency": {"type": "string"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"amount": {"type": "number"}
}
}
}
},
"required": ["vendor_name", "invoice_number", "total_amount"]
}
}
response = client.messages.create(
model="claude-haiku-3-5",
max_tokens=1024,
tools=[extract_tool],
# Force tool use — Claude won't try to answer in prose
tool_choice={"type": "tool", "name": "extract_invoice_data"},
messages=[
{"role": "user", "content": "Invoice #INV-2024-0392 from Acme Corp. Total: $4,230.00. Items: SaaS license $3,500, Setup fee $730."}
]
)
tool_call = next(b for b in response.content if b.type == "tool_use")
structured_data = tool_call.input # Guaranteed JSON that matches your schema
print(json.dumps(structured_data, indent=2))
The key here is tool_choice: {"type": "tool", "name": "..."}. This forces Claude to always invoke the named tool, turning tool use into a reliable structured extraction mechanism. I’d use this over prompt-engineering JSON output every time — it’s more reliable, easier to validate, and the schema doubles as documentation.
Pattern 2: The Agentic Loop (Where Things Get Complex)
The agentic loop is where Claude tool use function calling earns its reputation — and where most production bugs live. The pattern: Claude calls a tool, gets a result, decides whether to call another tool, repeats until it can answer the original question.
import anthropic
import json
client = anthropic.Anthropic()
def run_tool(name: str, tool_input: dict) -> str:
"""Your actual tool execution logic goes here."""
if name == "search_web":
return f"Search results for: {tool_input['query']} — [mock results]"
if name == "read_file":
return f"File contents of {tool_input['path']}: [mock content]"
return "Tool not found"
def agentic_loop(user_message: str, tools: list, max_iterations: int = 10) -> str:
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=4096,
tools=tools,
messages=messages
)
# No tool use — Claude is done
if response.stop_reason == "end_turn":
return next(
(b.text for b in response.content if hasattr(b, "text")), ""
)
# Process tool calls
if response.stop_reason == "tool_use":
# Add Claude's response (including tool_use blocks) to history
messages.append({"role": "assistant", "content": response.content})
# Execute all tool calls and collect results
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = run_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id, # Must match the tool_use block ID
"content": result
})
# Return tool results to Claude
messages.append({"role": "user", "content": tool_results})
raise RuntimeError(f"Agent exceeded {max_iterations} iterations — possible loop")
The tool_use_id matching on line 38 is where a lot of developers get tripped up. Every tool_use block in Claude’s response has a unique ID. Your tool_result block must reference that exact ID, or the API will reject your request with a validation error. The docs mention this, but it’s easy to miss when you’re iterating fast.
The Iteration Limit Is Not Optional
Always set a maximum iteration count. Claude can and does get into loops — especially when tools return ambiguous results or when the task is underspecified. A runaway agent at Sonnet pricing ($3/M input tokens) processing a 10k-token context window across 50 iterations will cost you roughly $1.50 per stuck request. Add a hard cap and surface a clear error to the user.
Parallel Tool Calls: When Claude Uses Multiple Tools at Once
Claude 3.5 and later models support parallel tool use — Claude can return multiple tool_use blocks in a single response when it determines they’re independent. This is a significant latency win for agents that would otherwise serialize calls unnecessarily.
# Claude might return something like this when asked to compare two stocks:
# response.content = [
# TextBlock(text="I'll look up both stocks simultaneously."),
# ToolUseBlock(id="toolu_01", name="get_stock_price", input={"ticker": "AAPL"}),
# ToolUseBlock(id="toolu_02", name="get_stock_price", input={"ticker": "MSFT"})
# ]
# Execute in parallel with asyncio or ThreadPoolExecutor
import concurrent.futures
tool_calls = [b for b in response.content if b.type == "tool_use"]
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {
executor.submit(run_tool, tc.name, tc.input): tc.id
for tc in tool_calls
}
tool_results = [
{
"type": "tool_result",
"tool_use_id": futures[f],
"content": f.result()
}
for f in concurrent.futures.as_completed(futures)
]
You don’t need to do anything special to enable this — just handle the case where response.content contains multiple tool_use blocks. If your loop only processes the first one, you’ll silently drop tool calls. Always iterate over all content blocks, not just the first.
tool_choice: Controlling Claude’s Tool Selection Behavior
The tool_choice parameter is underused and genuinely useful. Three modes:
{"type": "auto"}— Default. Claude decides whether to use a tool or respond directly.{"type": "any"}— Claude must use at least one tool from the list. Useful when you need structured output but don’t care which tool.{"type": "tool", "name": "tool_name"}— Forces a specific tool. Use for guaranteed structured extraction.
“Auto” is right for conversational agents where tool use is optional. “Any” and forced-tool modes are right for pipelines where downstream code expects structured data. Using “auto” in a data pipeline and then pattern-matching on whether Claude happened to use a tool is a reliability footgun I’ve seen in more than a few production systems.
When to Use Which Pattern
Use Structured Extraction (Forced Tool) When:
- You’re building a document processing pipeline and need reliable JSON output
- You’re replacing brittle regex or JSON-parsing prompts
- The task is classification, extraction, or transformation — not reasoning
- Cost matters: Haiku with forced tool use beats Sonnet with loose prompting for pure extraction tasks
Use the Agentic Loop When:
- The agent needs to gather information before it knows what to do next
- Tasks involve multiple heterogeneous tools (search + code execution + file I/O)
- You’re building a general-purpose assistant, not a single-task pipeline
- You can afford Sonnet or Opus — Haiku struggles with multi-step reasoning in my experience
Bottom Line: Pick Your Pattern Before You Write a Line of Code
Most agent failures I’ve debugged come from picking the wrong pattern early and then fighting the framework. Claude tool use function calling is a single mechanism with meaningfully different usage patterns, and the pattern shapes your error handling, your latency profile, and your per-request cost.
Solo founders building automation pipelines: Start with forced-tool structured extraction on Haiku. It’s cheap, reliable, and easier to test. Add the agentic loop only when the task genuinely requires multi-step reasoning.
Teams building production agents: Invest in good tool descriptions — they’re your primary reliability lever. Set iteration limits. Handle parallel tool calls explicitly. Run Sonnet for complex reasoning, route simpler extraction tasks to Haiku, and measure your token costs per workflow, not per API call.
Anyone integrating with n8n or Make: These platforms wrap the API in ways that hide the conversation history management. Make sure tool results are actually being passed back to Claude in the right format — silent failures where tool results get dropped are common in no-code wrappers and produce confusing behavior that looks like model errors but isn’t.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

