By the end of this tutorial, you’ll have a working Python service that ingests raw prospect data, asks Claude to score fit against your ICP, and routes high-quality leads directly into your CRM — with a full audit trail. If you’ve been trying to automate lead qualification with AI without drowning in brittle keyword rules or expensive sales ops headcount, this is the implementation to follow.
- Install dependencies — Set up the Python environment with Anthropic SDK, requests, and Pydantic
- Define your ICP scoring schema — Build a structured output model Claude will populate on every lead
- Write the qualification prompt — Craft a system prompt that encodes your real sales criteria
- Score a lead with Claude — Call the API and parse a validated score object
- Add CRM integration — Push scored leads to HubSpot (or swap for your CRM) with routing logic
- Wire up the webhook endpoint — Accept inbound leads from web forms, n8n, or Make
Why Rule-Based Scoring Keeps Failing
Most teams start with a spreadsheet: company size > 50 employees = +10 points, uses Salesforce = +5 points, submitted a demo request = +20 points. It works until your ICP shifts, or you start getting leads from a new channel that doesn’t map cleanly to any existing field. Then someone manually maintains the rules forever, or the system quietly degrades.
Claude doesn’t replace judgment — it encodes it. You write your ICP in plain English once, and the model applies it consistently across thousands of leads. The scoring is explainable (Claude returns a rationale string alongside the score), it handles missing fields gracefully, and you can update your criteria by editing a system prompt rather than refactoring a scoring function. The tradeoff: you’re adding ~$0.002–0.004 per lead at Claude Haiku pricing, and you’re adding an async step to your pipeline. Both are acceptable for any B2B product where a qualified lead is worth more than a few dollars.
Step 1: Install Dependencies
pip install anthropic pydantic requests fastapi uvicorn python-dotenv
We’re using anthropic for the API, pydantic for validated output schemas, fastapi for the webhook endpoint, and requests for CRM API calls. Pin these in a requirements.txt — the Anthropic SDK especially has breaking changes between minor versions.
Step 2: Define Your ICP Scoring Schema
Pydantic gives you structured, validated output. Define exactly what you want Claude to return — score, tier, rationale, and recommended next action.
from pydantic import BaseModel, Field
from typing import Literal
class LeadScore(BaseModel):
score: int = Field(ge=0, le=100, description="Fit score from 0 to 100")
tier: Literal["hot", "warm", "cold", "disqualified"]
rationale: str = Field(description="2-3 sentence explanation of the score")
next_action: Literal["route_to_ae", "enroll_nurture", "disqualify"]
confidence: Literal["high", "medium", "low"]
missing_data: list[str] = Field(
default_factory=list,
description="Fields that would improve scoring accuracy"
)
class LeadInput(BaseModel):
company_name: str
industry: str | None = None
employee_count: int | None = None
annual_revenue: str | None = None
job_title: str | None = None
use_case: str | None = None
tech_stack: list[str] = Field(default_factory=list)
source: str | None = None
message: str | None = None
The missing_data field is underrated. When Claude flags that employee count is missing, you can trigger a data enrichment step before routing. This turns your qualifier into a feedback loop rather than a one-shot filter. For more on getting reliable structured output from Claude, see reducing LLM hallucinations in production — the verification patterns there apply directly here.
Step 3: Write the Qualification Prompt
This is where you actually encode your ICP. Be specific. “Enterprise SaaS companies” is useless — “Series A+ SaaS companies with 50–500 employees, buying for an engineering or data team, with a technical buyer in the loop” is what Claude needs.
SYSTEM_PROMPT = """You are a senior sales qualification specialist for [Company Name].
Your job is to score inbound leads against our Ideal Customer Profile (ICP) and return a structured assessment.
## Our ICP
- **Company size**: 50–500 employees (sweet spot: 100–250)
- **Industry**: B2B SaaS, fintech, e-commerce platforms, or dev tools
- **Buyer persona**: Engineering managers, VPs of Engineering, CTOs, or technical founders
- **Use case fit**: Teams automating internal workflows, building data pipelines, or integrating third-party APIs
- **Budget signal**: Series A or above, or profitable SMB with $5M+ ARR
- **Anti-ICP**: Agencies, solo freelancers, non-technical buyers, consumer apps
## Scoring guidance
- 80-100: Strong ICP match, all key criteria met → route_to_ae
- 60-79: Partial match, 1-2 gaps → enroll_nurture
- 40-59: Weak signal, significant gaps → enroll_nurture with low priority
- 0-39: Not a fit → disqualify
## Output format
Return ONLY valid JSON matching the LeadScore schema. No markdown, no preamble.
If a field is missing, estimate based on available context and flag it in missing_data.
Always write the rationale in plain English that a sales rep can quote directly in their outreach."""
The instruction to write rationale “that a sales rep can quote directly” is intentional — it forces Claude to produce something useful downstream rather than internal ML-speak. Your role prompting approach matters here: framing Claude as a “senior sales qualification specialist” consistently produces more calibrated scores than a generic assistant framing.
Step 4: Score a Lead with Claude
import anthropic
import json
import os
from dotenv import load_dotenv
load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
def score_lead(lead: LeadInput) -> LeadScore:
# Serialize lead to a clean string for the prompt
lead_data = lead.model_dump(exclude_none=False)
lead_text = json.dumps(lead_data, indent=2)
response = client.messages.create(
model="claude-haiku-4-5", # Haiku is fast and cheap enough for this task
max_tokens=512,
system=SYSTEM_PROMPT,
messages=[
{
"role": "user",
"content": f"Score this inbound lead:\n\n{lead_text}"
}
]
)
raw_output = response.content[0].text
# Strip any accidental markdown fencing Claude adds
if raw_output.startswith("```"):
raw_output = raw_output.split("```")[1]
if raw_output.startswith("json"):
raw_output = raw_output[4:]
parsed = json.loads(raw_output)
return LeadScore(**parsed) # Pydantic validates and raises if schema is wrong
I’m using Claude Haiku here deliberately. Sonnet produces marginally better reasoning on edge cases, but for lead scoring where the criteria are well-defined, Haiku is accurate enough and runs in under two seconds per lead. At current pricing that’s roughly $0.0008–0.002 per lead depending on input length — essentially free at B2B lead volumes.
One thing the docs don’t warn you about: Claude occasionally wraps JSON in markdown fences even when you say not to. The stripping logic above handles that. For production, add a retry with a more explicit reminder if parsing fails — see the LLM fallback and retry logic patterns we’ve covered for the full graceful degradation approach.
Step 5: Add CRM Integration
Here’s a HubSpot integration that creates a contact and sets custom properties based on the score. Swap the property names for your actual HubSpot schema.
import requests
HUBSPOT_API_KEY = os.getenv("HUBSPOT_API_KEY")
HUBSPOT_BASE = "https://api.hubapi.com"
def push_to_hubspot(lead: LeadInput, score: LeadScore) -> dict:
headers = {
"Authorization": f"Bearer {HUBSPOT_API_KEY}",
"Content-Type": "application/json"
}
# Map tier to HubSpot lifecycle stage
lifecycle_map = {
"hot": "salesqualifiedlead",
"warm": "marketingqualifiedlead",
"cold": "lead",
"disqualified": "other"
}
contact_payload = {
"properties": {
"company": lead.company_name,
"jobtitle": lead.job_title or "",
"industry": lead.industry or "",
"ai_lead_score": score.score,
"ai_lead_tier": score.tier,
"ai_lead_rationale": score.rationale,
"ai_next_action": score.next_action,
"lifecyclestage": lifecycle_map[score.tier],
"lead_source": lead.source or "web"
}
}
resp = requests.post(
f"{HUBSPOT_BASE}/crm/v3/objects/contacts",
headers=headers,
json=contact_payload,
timeout=10
)
resp.raise_for_status()
contact = resp.json()
# Route hot leads to the sales rep queue via task creation
if score.next_action == "route_to_ae":
task_payload = {
"properties": {
"hs_task_subject": f"Hot lead: {lead.company_name}",
"hs_task_body": score.rationale,
"hs_task_priority": "HIGH",
"hs_task_type": "TODO"
},
"associations": [
{
"to": {"id": contact["id"]},
"types": [{"associationCategory": "HUBSPOT_DEFINED", "associationTypeId": 204}]
}
]
}
requests.post(
f"{HUBSPOT_BASE}/crm/v3/objects/tasks",
headers=headers,
json=task_payload,
timeout=10
)
return contact
Step 6: Wire Up the Webhook Endpoint
from fastapi import FastAPI, HTTPException
import logging
app = FastAPI()
logger = logging.getLogger(__name__)
@app.post("/qualify-lead")
async def qualify_lead(lead: LeadInput):
try:
score = score_lead(lead)
logger.info(f"Scored {lead.company_name}: {score.tier} ({score.score})")
crm_contact = push_to_hubspot(lead, score)
return {
"status": "processed",
"contact_id": crm_contact["id"],
"score": score.model_dump()
}
except json.JSONDecodeError as e:
logger.error(f"Claude returned invalid JSON: {e}")
raise HTTPException(status_code=502, detail="Scoring service returned invalid output")
except requests.HTTPError as e:
logger.error(f"HubSpot API error: {e}")
raise HTTPException(status_code=502, detail="CRM update failed")
Run with uvicorn app:app --port 8000. Point your web form, n8n webhook node, or Make HTTP module at POST /qualify-lead. If you’re choosing between automation platforms for the surrounding workflow, the n8n vs Make vs Zapier cost and architecture comparison is worth reading before you commit.
Common Errors
1. Pydantic validation fails on Claude’s output
Claude occasionally returns a tier value outside your literal set (e.g., “medium” instead of “warm”) or omits a required field. Fix this with a retry wrapper that appends the validation error to the next API call: “Your previous response failed validation with: {error}. Return corrected JSON.” Two retries resolves this in >99% of cases.
2. HubSpot 409 conflict on duplicate contacts
If the same email hits your endpoint twice, HubSpot returns a 409. Use the search API first to check for an existing contact by email, then PATCH instead of POST. This is especially common if you’re processing form submissions that fire twice on slow connections.
3. Scores are consistently too high or too low
This is a prompt calibration problem, not an API problem. Add 5–10 historical examples (with your human scores) as few-shot examples in the system prompt. The model anchors to your examples and the distribution tightens immediately. A 10% deviation from your manual scores is a reasonable target.
What to Build Next
Add enrichment before scoring. Integrate Clearbit or Apollo to automatically populate missing fields (employee count, funding stage, tech stack) before the Claude call. You’ll get cleaner scores and fewer missing_data flags — and you can use those flags to trigger enrichment only when needed, keeping API costs low. Pair this with a more advanced lead scoring pipeline that incorporates behavioral signals like page visits and email opens alongside the demographic scoring built here.
Bottom Line: Who Should Build This
Solo founders running outbound or content-driven inbound should deploy this immediately — even a simple Haiku-based qualifier beats manual triage, and the whole pipeline runs for under $5/month at typical early-stage lead volumes.
Growth-stage teams with a dedicated SDR function will get the most value from the routing logic: hot leads hitting a rep’s task queue within seconds of form submission is a measurable lift on speed-to-lead, which directly impacts close rates.
Enterprise teams should treat this as a prototype and plan to add enrichment, a confidence threshold that triggers human review, and a feedback loop where sales reps can correct scores — that correction data becomes training signal for prompt refinement over time.
The core of automate lead qualification AI is straightforward: encode your ICP clearly, trust Claude to apply it consistently, validate the output with Pydantic, and route with simple conditional logic. The model doesn’t need to be clever — it needs your criteria to be specific.
Frequently Asked Questions
How much does it cost to automate lead qualification with Claude?
At Claude Haiku pricing, scoring a single lead with a moderately detailed prompt costs roughly $0.001–0.003 per call. For a team processing 1,000 leads per month, that’s $1–3 in API costs. Sonnet is about 5x more expensive but rarely necessary for well-defined qualification criteria — save it for edge-case resolution or where you need more nuanced reasoning.
Can I use this with a CRM other than HubSpot?
Yes. The scoring logic is CRM-agnostic — only the push_to_hubspot() function changes. Salesforce uses the REST API with OAuth2; replace the contact creation call with a POST to /services/data/v57.0/sobjects/Lead/. Pipedrive, Attio, and Close all have REST APIs with similar create-and-update patterns. The scored LeadScore object gives you everything you need regardless of destination.
What happens when Claude returns an invalid score or the API is down?
The Pydantic validation in Step 4 will raise a ValidationError, which you should catch and retry with the error message appended to the prompt. For API outages, implement exponential backoff and queue the lead for retry — don’t silently drop it. The FastAPI endpoint returns a 502 so the caller (your form, n8n workflow, etc.) knows to retry.
How do I keep the ICP scoring criteria up to date?
Store the system prompt in a database or config file rather than hardcoding it. Build a simple admin UI or even a Notion doc that your sales team can edit, then sync it to the prompt on deploy. Run a monthly calibration: pull 20 recent scored leads, have a rep rate them manually, and compare — if your model’s accuracy drops below 80%, update the ICP description.
Is Claude better than GPT-4 for lead qualification specifically?
For structured output with explicit criteria, both perform similarly on well-formed prompts. Claude tends to follow schema constraints more reliably out of the box, which matters when you need JSON that validates on the first try. For a detailed comparison in a code-heavy context, see our Claude vs GPT-4 benchmark — the reliability findings there generalize to structured generation tasks like scoring.
Put this into practice
Try the Payment Integration agent — ready to use, no setup required.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

