Sunday, April 5

Polling is a tax on your infrastructure and your patience. If you’ve built AI agents that check for new data every 30 seconds, you already know the pain: wasted API calls, delayed responses, and a cron job graveyard that’s slowly eating your compute budget. Webhook triggered AI agents solve this cleanly — instead of your agent asking “anything new?”, external systems push events to your agent the moment something happens. No lag, no wasted cycles, no artificial delay before Claude starts working.

This article covers the full architecture: receiving webhook payloads, validating and routing events, processing with Claude in the background, and maintaining state across multiple triggers. All code is production-relevant — I’ve used variations of this pattern in real deployments handling Stripe payment events, GitHub PR reviews, and CRM contact updates.

Why Event-Driven Architecture Changes How Agents Behave

Most tutorial agents are stateless request-response systems. You send a message, Claude replies, done. That works for chatbots. It breaks down the moment you want agents that react to the world — a new Stripe charge, a GitHub comment, a form submission, a status change in your CRM.

The alternative most people reach for first is polling: a scheduled job that checks an API every N minutes and fires off Claude if something changed. This has real costs. At a 1-minute polling interval hitting a typical REST API, you’re making 1,440 requests per day per integration — most returning nothing. You’re also accepting up to 60 seconds of latency before your agent knows something happened.

Webhooks flip this. The external system makes an HTTP POST to your endpoint the moment an event fires. Your agent receives it in milliseconds. The architecture looks like this:


# Simplified event flow
# 1. External system (Stripe, GitHub, etc.) fires POST to your endpoint
# 2. Your webhook receiver validates the payload and acknowledges with 200
# 3. Background worker picks up the event and runs Claude
# 4. Claude's output triggers downstream actions (DB write, API call, notification)

The critical design constraint: you must return a 200 response within a few seconds or the sender will retry — and you’ll end up processing the same event multiple times. This means Claude cannot run synchronously inside the webhook handler. Background processing is mandatory, not optional.

Building the Webhook Receiver: Flask + Redis Queue

Here’s a working FastAPI receiver that validates an incoming webhook, queues it for background processing, and returns immediately. I’m using Redis and RQ (Redis Queue) for the job queue — it’s simpler than Celery for this use case and the operational overhead is low.


import hmac
import hashlib
import json
from fastapi import FastAPI, Request, HTTPException, BackgroundTasks
from redis import Redis
from rq import Queue
import anthropic

app = FastAPI()
redis_conn = Redis(host="localhost", port=6379)
task_queue = Queue(connection=redis_conn)

WEBHOOK_SECRET = "your_webhook_secret_here"  # store in env var

def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
    """Verify HMAC-SHA256 webhook signature (GitHub/Stripe style)."""
    expected = hmac.new(
        secret.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    # Use compare_digest to prevent timing attacks
    return hmac.compare_digest(f"sha256={expected}", signature)

@app.post("/webhook/events")
async def receive_webhook(request: Request):
    body = await request.body()
    signature = request.headers.get("X-Hub-Signature-256", "")

    if not verify_signature(body, signature, WEBHOOK_SECRET):
        raise HTTPException(status_code=401, detail="Invalid signature")

    payload = json.loads(body)
    event_type = payload.get("event_type", "unknown")

    # Enqueue — do NOT call Claude here
    job = task_queue.enqueue(
        process_event_with_claude,
        payload,
        event_type,
        job_timeout=120  # 2 min max for Claude processing
    )

    # Return immediately — this is what prevents retries
    return {"status": "queued", "job_id": job.id}

The signature verification is non-negotiable. An unprotected webhook endpoint is an open API for anyone who finds your URL. Always validate before touching the payload.

Processing Events with Claude in the Background

The worker function that RQ calls can now take as long as it needs. Here’s where Claude actually runs:


def process_event_with_claude(payload: dict, event_type: str):
    """Background worker — called by RQ, not the HTTP handler."""
    client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

    # Build a dynamic system prompt based on event type
    system_prompts = {
        "payment.succeeded": "You are a payment analyst. Summarize this transaction and flag anything unusual.",
        "pull_request.opened": "You are a code reviewer. Review this PR description and suggest questions for the author.",
        "contact.created": "You are a CRM analyst. Score this lead and suggest the best follow-up action.",
    }

    system = system_prompts.get(event_type, "Analyze this event and summarize what happened.")

    message = client.messages.create(
        model="claude-haiku-4-5",  # ~$0.00025 per 1K input tokens — use Haiku for high-volume events
        max_tokens=512,
        system=system,
        messages=[
            {
                "role": "user",
                "content": f"Event type: {event_type}\n\nPayload:\n{json.dumps(payload, indent=2)}"
            }
        ]
    )

    result = message.content[0].text

    # Store result — options: DB, Redis key, send to another API
    store_result(event_type, payload.get("id"), result)
    return result

def store_result(event_type: str, event_id: str, analysis: str):
    """Persist Claude's output — implement based on your stack."""
    # Example: write to Postgres, send to Slack, call another webhook
    print(f"[{event_type}] {event_id}: {analysis[:100]}...")

Model choice matters here. For webhook-triggered agents processing high volumes of events, Claude Haiku 4.5 is roughly $0.00025 per 1K input tokens — a 500-token payload analysis costs less than $0.001. If you’re processing 10,000 events per day, that’s under $10. Sonnet is 5x the cost with meaningfully better reasoning; use it for events where the analysis quality directly drives revenue decisions.

State Management Across Triggers

Stateless event processing works for isolated events. But real agents often need context from previous events — “this is the third failed payment for this customer” or “this PR author has never submitted code here before.” You need a state layer.

Redis for Short-Term Event Context


import redis
from datetime import timedelta

state_store = redis.Redis(host="localhost", port=6379, decode_responses=True)

def get_customer_context(customer_id: str) -> dict:
    """Retrieve recent event history for this customer."""
    key = f"customer_events:{customer_id}"
    history = state_store.lrange(key, 0, 9)  # last 10 events
    return {"recent_events": [json.loads(e) for e in history]}

def append_event_to_history(customer_id: str, event: dict):
    """Add this event to the customer's rolling history."""
    key = f"customer_events:{customer_id}"
    state_store.lpush(key, json.dumps(event))
    state_store.ltrim(key, 0, 49)       # keep last 50 events
    state_store.expire(key, timedelta(days=30))  # auto-expire

def process_event_with_context(payload: dict, event_type: str):
    """Version with stateful context injected into Claude's prompt."""
    customer_id = payload.get("customer_id")
    context = get_customer_context(customer_id) if customer_id else {}

    context_str = ""
    if context.get("recent_events"):
        context_str = f"\n\nCustomer history (last {len(context['recent_events'])} events):\n"
        context_str += json.dumps(context["recent_events"], indent=2)

    client = anthropic.Anthropic()
    message = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=[{
            "role": "user",
            "content": f"New event: {event_type}\n{json.dumps(payload)}{context_str}"
        }]
    )

    # Store this event for future context
    if customer_id:
        append_event_to_history(customer_id, {"type": event_type, "ts": payload.get("created")})

    return message.content[0].text

This pattern keeps your token costs predictable. Instead of sending unbounded history, you cap it at the last N events. Adjust the cap based on what your analysis needs — for fraud detection, 10 events is usually enough; for churn prediction, you might want 30 days of touchpoints.

Handling Duplicate Events and Failures

Webhook senders retry on failure. Stripe will retry for 72 hours with exponential backoff. GitHub retries for 3 days. If your handler crashes or returns a 5xx, you will process the same event multiple times. This is called the at-least-once delivery problem, and it will bite you if you ignore it.

Idempotency with Event ID Tracking


def is_duplicate_event(event_id: str) -> bool:
    """Check if we've already processed this event."""
    key = f"processed_event:{event_id}"
    # SET NX (only set if not exists) — atomic check-and-set
    result = state_store.set(key, "1", nx=True, ex=86400)  # 24h TTL
    return result is None  # None means key already existed = duplicate

@app.post("/webhook/events")
async def receive_webhook(request: Request):
    body = await request.body()
    # ... signature verification ...

    payload = json.loads(body)
    event_id = payload.get("id")  # Stripe uses 'id', GitHub uses 'delivery' header

    if event_id and is_duplicate_event(event_id):
        return {"status": "duplicate", "skipped": True}  # 200 to stop retries

    task_queue.enqueue(process_event_with_claude, payload, payload.get("event_type"))
    return {"status": "queued"}

The SET NX operation is atomic — two simultaneous requests for the same event ID cannot both get through. This matters when Stripe retries while your first processing job is still running.

Testing Webhooks Locally Without Exposing Ports

The biggest friction with webhook development is that your local server isn’t publicly accessible. Three options, ranked by preference:

  • ngrok — `ngrok http 8000` gives you a public HTTPS URL in seconds. Free tier works fine for development. Costs $10/month if you need a stable domain.
  • Stripe CLI — `stripe listen –forward-to localhost:8000/webhook/events` proxies Stripe webhooks directly. Best option if you’re only dealing with Stripe events.
  • Cloudflare Tunnel — Free, no account needed for basic use, and works for any HTTP service. `cloudflared tunnel –url http://localhost:8000`.

For automated testing without any tunnel, record real webhook payloads to JSON files and replay them against your local endpoint. This is faster than waiting for real events and lets you test edge cases:


# save_webhook.py — run this once to capture a real payload
import json
from fastapi import Request

@app.post("/webhook/capture")
async def capture_webhook(request: Request):
    payload = await request.json()
    with open(f"fixtures/{payload.get('type', 'unknown')}.json", "w") as f:
        json.dump(payload, f, indent=2)
    return {"captured": True}

What Breaks in Production (And How to Handle It)

Running webhook triggered AI agents at scale surfaces a few consistent failure modes:

Claude API rate limits — Anthropic’s rate limits are per-organization and scale with your usage tier. If you’re processing a burst of 500 events simultaneously, you’ll hit rate limits. Add exponential backoff in your worker and set RQ’s job concurrency to something sane (start at 5 concurrent workers, increase based on your tier).

Redis memory under sustained load — If your queue grows faster than workers can drain it, Redis fills up. Monitor queue depth. Set a `REDIS_MAXMEMORY` policy (`allkeys-lru`) so Redis evicts old data rather than refusing writes.

Payload schema changes — External APIs change their webhook schemas without always warning you. Defensive parsing (`payload.get(“field”)` not `payload[“field”]`) and structured logging of raw payloads lets you reconstruct what happened when a new field breaks your prompt template.

Long-tail processing time — Claude’s p99 latency is higher than p50. Set your RQ job timeout to at least 2-3x your expected processing time. A job that times out gets marked failed and may be retried.

When to Use This Pattern

Use webhook triggered AI agents when: you need sub-second reaction to external events, you’re integrating with systems that support webhooks (Stripe, GitHub, Slack, HubSpot, Shopify), or you’re replacing a polling loop that’s running more than a few times per minute.

Stick with polling when: the source system doesn’t support webhooks (many legacy APIs), you need to backfill historical data, or your event volume is low enough that the polling overhead is negligible (less than once per hour).

For solo founders building their first event-driven integration: start with n8n’s webhook node and an HTTP request to Claude’s API — no code required, deploys in 20 minutes, handles moderate volume easily. Move to the custom FastAPI + RQ stack when you need custom validation logic, state management, or you’re hitting n8n’s execution limits.

For teams building production systems: the pattern above is your baseline. Add a proper job queue dashboard (RQ Dashboard or Flower), structured logging with correlation IDs linking webhook receipt to Claude’s response, and alerting on queue depth. The architecture scales horizontally — add more RQ workers behind the same Redis instance when volume grows.

The shift to webhook triggered AI agents isn’t just about latency — it fundamentally changes what’s possible. Agents that react in real-time can intervene before problems escalate, personalize at the moment of action, and chain events together in ways that polling-based systems can never match cleanly.

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Share.
Leave A Reply