By the end of this tutorial, you’ll have a fully functional Claude skill — a typed, error-handled Python function that Claude can reliably call as a tool — wired up from a raw API function to a working agent loop. If you’ve ever tried to build Claude skill integration and ended up with a brittle mess of string parsing and silent failures, this is the guide that fixes that.
- Install dependencies — Set up the Anthropic SDK and supporting libraries
- Define your skill schema — Write a JSON schema Claude will use to understand and invoke your function
- Implement the skill function — Build the actual Python function with type safety and error handling
- Wire up the agent loop — Connect skill to Claude and handle the tool_use/tool_result cycle
- Test skill invocation — Validate that Claude calls the skill correctly under different prompts
- Add production guards — Timeouts, retries, and structured error responses
What a Claude Skill Actually Is
Anthropic uses the term “tool” in their API; the broader ecosystem calls them “skills.” They’re the same thing: a JSON schema definition paired with a callable function. When Claude decides it needs data or needs to perform an action, it emits a tool_use content block with the tool name and arguments. Your code runs the function, returns a tool_result, and Claude continues reasoning from there.
The important thing to understand is that Claude doesn’t call your function directly. It outputs structured JSON saying “I want to call this function with these arguments.” You interpret that output, run the function, and feed the result back. This indirection is both the power and the pitfall — it means you control validation, error handling, and rate limiting at every step.
If you haven’t seen Claude’s tool use mechanics before, the deep dive on Claude tool use with Python covers the underlying protocol well. This tutorial focuses on building a production-quality skill from scratch, not just the happy path.
Step 1: Install Dependencies
pip install anthropic pydantic httpx tenacity
# Pin versions to avoid breaking changes
# anthropic==0.30.0 pydantic==2.7.0 httpx==0.27.0 tenacity==8.3.0
You need anthropic for the API client, pydantic for input validation (don’t skip this — Claude occasionally sends slightly malformed arguments), httpx if your skill calls external APIs, and tenacity for retry logic.
Step 2: Define Your Skill Schema
The schema is what Claude reads to understand what your function does and what arguments it accepts. Bad schemas produce bad invocations. Spend time here.
WEATHER_SKILL = {
"name": "get_current_weather",
"description": (
"Retrieve current weather conditions for a given city. "
"Returns temperature in Celsius, weather description, humidity percentage, "
"and wind speed in km/h. Use this when the user asks about current weather."
),
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'London' or 'New York'. Do not include country codes."
},
"units": {
"type": "string",
"enum": ["metric", "imperial"],
"description": "Temperature units. Defaults to metric if not specified.",
"default": "metric"
}
},
"required": ["city"]
}
}
A few things that actually matter here: the description field on the tool itself is critical — Claude uses it to decide when to call the skill. Vague descriptions lead to missed invocations or wrong invocations. The description on each property tells Claude what format to pass. If you say “City name, e.g. ‘London'”, Claude will follow that pattern consistently.
Step 3: Implement the Skill Function
import httpx
from pydantic import BaseModel, ValidationError
from typing import Any
# Pydantic model mirrors your schema — catches bad input before it reaches your API
class WeatherInput(BaseModel):
city: str
units: str = "metric"
class WeatherResult(BaseModel):
temperature: float
description: str
humidity: int
wind_speed: float
city: str
error: str | None = None # Structured error field — never raise exceptions outward
def get_current_weather(raw_input: dict[str, Any]) -> dict[str, Any]:
"""
Validate input, call the weather API, return a structured result.
Always returns a dict — never raises. Claude needs a tool_result, not a traceback.
"""
try:
params = WeatherInput(**raw_input)
except ValidationError as e:
# Return a structured error Claude can reason about
return {"error": f"Invalid input: {e.errors()[0]['msg']}", "city": raw_input.get("city", "unknown")}
try:
# Replace with your actual API key and endpoint
response = httpx.get(
"https://api.openweathermap.org/data/2.5/weather",
params={"q": params.city, "units": params.units, "appid": "YOUR_API_KEY"},
timeout=5.0 # Always set a timeout on external calls
)
response.raise_for_status()
data = response.json()
return WeatherResult(
temperature=data["main"]["temp"],
description=data["weather"][0]["description"],
humidity=data["main"]["humidity"],
wind_speed=data["wind"]["speed"],
city=data["name"]
).model_dump()
except httpx.TimeoutException:
return {"error": "Weather API timed out after 5 seconds", "city": params.city}
except httpx.HTTPStatusError as e:
return {"error": f"Weather API returned {e.response.status_code}", "city": params.city}
except Exception as e:
return {"error": f"Unexpected error: {str(e)}", "city": params.city}
Critical pattern: never let your skill function raise an exception. If it does, your agent loop crashes. Instead, return a dict with an error key. Claude will read that, understand something went wrong, and can either retry with different arguments or tell the user what happened gracefully.
Step 4: Wire Up the Agent Loop
This is where the skill actually connects to Claude. The loop runs until Claude either returns a final text response or hits your max-turn limit.
import anthropic
import json
client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_KEY")
# Map tool names to handler functions — add all your skills here
SKILL_REGISTRY: dict[str, callable] = {
"get_current_weather": get_current_weather,
}
def run_agent(user_message: str, max_turns: int = 5) -> str:
"""
Run the Claude agent loop with skill invocation.
Returns the final text response.
"""
messages = [{"role": "user", "content": user_message}]
for turn in range(max_turns):
response = client.messages.create(
model="claude-opus-4-5", # Use Haiku for cheaper/faster dev testing: claude-haiku-4-5
max_tokens=1024,
tools=[WEATHER_SKILL], # Pass all registered skills
messages=messages
)
# Append Claude's response to the conversation
messages.append({"role": "assistant", "content": response.content})
# If Claude is done, return the text
if response.stop_reason == "end_turn":
# Extract text from content blocks
for block in response.content:
if hasattr(block, "text"):
return block.text
return ""
# Handle tool_use stop reason
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
handler = SKILL_REGISTRY.get(block.name)
if not handler:
# Unknown tool — return an error result
result = {"error": f"No handler registered for skill '{block.name}'"}
else:
result = handler(block.input) # block.input is already a dict
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result) # Serialize back to string for the API
})
# Feed all tool results back as a user turn
messages.append({"role": "user", "content": tool_results})
return "Max turns reached without a final response."
One thing the docs understate: you must append Claude’s full response content (including tool_use blocks) before appending tool results. If you skip that step or only append the text blocks, the API will throw a validation error about mismatched tool_use IDs. This trips up almost everyone on their first build.
Step 5: Test Skill Invocation
import unittest
from unittest.mock import patch, MagicMock
class TestWeatherSkill(unittest.TestCase):
def test_valid_input(self):
with patch("httpx.get") as mock_get:
mock_get.return_value = MagicMock(
status_code=200,
json=lambda: {
"main": {"temp": 18.5, "humidity": 65},
"weather": [{"description": "partly cloudy"}],
"wind": {"speed": 12.0},
"name": "London"
}
)
result = get_current_weather({"city": "London"})
self.assertEqual(result["city"], "London")
self.assertIsNone(result.get("error"))
def test_missing_required_field(self):
result = get_current_weather({}) # No city provided
self.assertIn("error", result)
def test_api_timeout(self):
with patch("httpx.get", side_effect=httpx.TimeoutException("timeout")):
result = get_current_weather({"city": "Berlin"})
self.assertIn("timed out", result["error"])
if __name__ == "__main__":
unittest.main()
Test against the actual Claude agent loop too — use claude-haiku-4-5 during development to keep costs low. A full agent loop test with Haiku costs roughly $0.001–$0.003 per run depending on context length. Run 50 test cases and you’re looking at under $0.15.
Step 6: Add Production Guards
For anything beyond a prototype, you need retries with backoff, per-skill timeouts, and logging. This is especially true if your skills call external APIs that are occasionally flaky.
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import logging
logger = logging.getLogger(__name__)
def with_retry(skill_fn, raw_input: dict, max_attempts: int = 3) -> dict:
"""
Wrap any skill function with retry logic for transient failures.
Does NOT retry on validation errors — those won't self-heal.
"""
@retry(
stop=stop_after_attempt(max_attempts),
wait=wait_exponential(multiplier=1, min=1, max=10),
retry=retry_if_exception_type(httpx.TransportError),
reraise=False # Return error dict instead of raising
)
def _run():
return skill_fn(raw_input)
try:
return _run()
except Exception as e:
logger.error(f"Skill {skill_fn.__name__} failed after {max_attempts} attempts: {e}")
return {"error": f"Skill failed after {max_attempts} retries: {str(e)}"}
For more comprehensive patterns on handling transient failures in LLM pipelines, the article on LLM fallback and retry logic for production covers the broader picture including model-level fallbacks.
Common Errors
Error 1: “messages: roles must alternate between user and assistant”
This usually means you forgot to append Claude’s assistant message before appending tool results. The conversation structure must be user → assistant (with tool_use blocks) → user (with tool_results). Double-check that your messages.append({"role": "assistant", "content": response.content}) runs before you build and append the tool_results list.
Error 2: Claude calls the wrong skill or skips calling it entirely
Almost always a schema description problem. If your tool description is vague or overlaps with another tool, Claude hedges. Make the description explicit about when to use this tool vs alternatives. Also check that your required fields in the schema match what you actually need — if Claude is omitting an argument, it may not appear in required.
This is also where investing in good system prompts pays off. The guide on system prompts that actually work has patterns for guiding tool selection behavior specifically.
Error 3: Pydantic validation errors surfacing as 500s
Claude occasionally passes arguments that technically match the schema type but fail your business logic — a city name with special characters, a number outside your expected range. If your Pydantic model uses strict validators, these surface as ValidationError exceptions. The fix is wrapping the Pydantic instantiation in a try/except (as shown in Step 3) and returning a structured error. Never let validation exceptions propagate up to the agent loop. For patterns around preventing this class of failures more broadly, the article on reducing LLM hallucinations with structured outputs is worth reading alongside this one.
What to Build Next
Add skill chaining: build a second skill — say, get_forecast — and watch Claude decide to call get_current_weather first for context before calling the forecast. This multi-step planning behavior is where the tool use architecture starts to feel genuinely powerful. The natural extension after that is giving your agent persistent memory so it can remember which cities a user cares about across sessions — the persistent memory architecture guide covers exactly how to wire that up.
Bottom Line: Who Should Build This Now
Solo founders and small teams: start with this exact pattern — one skill, one agent loop, Haiku for testing. Get it working end-to-end before adding complexity. The skill registry pattern in Step 4 scales cleanly to 10+ skills without architectural changes.
Teams with existing APIs: your existing internal APIs are the best candidates for skills. Wrap them in Pydantic models, add the JSON schema definition, and drop them into the registry. The main investment is writing good tool descriptions — budget an hour per skill to iterate on those.
Production systems: add the tenacity retry wrapper, proper logging with tool invocation metadata (tool name, latency, success/failure), and a circuit breaker for skills that call external APIs. The build Claude skill integration pattern shown here handles all of that cleanly when you layer on observability from the start.
Frequently Asked Questions
How do I pass authentication credentials to a Claude skill?
Never pass credentials through Claude’s tool arguments — Claude sees everything in the tool input, and you don’t want API keys in your conversation history or logs. Instead, load credentials from environment variables inside the skill function itself, or use a closure to inject them at registration time. Your skill function closes over the API key; Claude only sees sanitized input parameters.
Can Claude call multiple skills in a single turn?
Yes. When Claude decides it needs multiple tools, it can emit multiple tool_use blocks in a single response. Your loop needs to iterate over all blocks in response.content, execute each skill, and return all results in a single tool_result user message. The code in Step 4 already handles this — the tool_results list collects all results before appending.
What’s the difference between Claude tools and MCP skills?
Claude tools (what this tutorial covers) are defined inline in your API call — you own the execution loop and the transport. MCP (Model Context Protocol) is a standardized protocol where skills live in separate servers that Claude can discover and invoke. MCP is better for shared, reusable skills across multiple agents; inline tools are simpler for single-agent use cases where you control everything.
How many tools can I register before performance degrades?
Anthropic doesn’t publish a hard limit, but in practice beyond 20-30 tools you’ll see Claude start making less reliable tool selection decisions — there’s too much for it to reason about efficiently. If you need more, group related skills behind a dispatcher skill, or use a dynamic tool retrieval system that only surfaces the 5-10 most relevant skills per query based on semantic similarity.
How do I test that Claude is calling my skill with the right arguments?
Log every tool_use block before executing it — capture block.name, block.input, and the result. Run your test prompts and inspect those logs. For automated testing, use Claude Haiku (roughly $0.001 per call) to run a suite of prompts and assert that the expected skill was invoked with arguments matching your expected patterns. Unit test the skill functions independently using mocks.
What happens if my skill takes too long and the agent times out?
The Anthropic API call itself will wait for your tool result — there’s no server-side timeout on your execution. The risk is your own infrastructure timing out. Always set explicit timeouts on any I/O inside your skill (as shown in Step 3 with timeout=5.0), and return an error dict when they trigger. If a skill legitimately takes 30+ seconds, consider making it async and returning a job ID that Claude can poll with a second skill call.
Put this into practice
Try the Mcp Integration Engineer agent — ready to use, no setup required.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

