Building Batch Processing Workflows with Claude API: Handle 10K+ Documents Efficiently

If you’re running extraction pipelines, content classification, or document analysis at scale, you’ve probably already felt the pain: standard API calls get expensive fast, rate limits cause headaches, and managing thousands of concurrent requests turns into its own engineering problem. Claude batch API processing sidesteps most of this by letting you submit large jobs asynchronously and get results back within 24 hours — at exactly 50% of standard API pricing. For workloads that don’t need real-time responses, this is one of the most practical cost optimizations available right now.

This article walks through a complete implementation: structuring your batch jobs, submitting them correctly, polling for results, handling failures, and calculating real costs. The examples use Anthropic’s Message Batches API with claude-haiku-3-5 and claude-sonnet-4-5, but the architecture applies across model tiers.

What the Batch API Actually Does (and Doesn’t Do)

The Anthropic Message Batches API accepts up to 10,000 requests in a single batch, processes them asynchronously, and returns results within 24 hours. In practice, most jobs complete in 1–4 hours depending on load. You get the same model quality as synchronous calls — this isn’t a degraded endpoint, just a different delivery mechanism.

What you give up: latency. If your use case requires responses in under a few seconds, the batch API is the wrong tool. But for overnight document processing, weekly classification runs, bulk content generation, or any pipeline where you can afford to wait, it’s a straightforward win.

Current pricing at time of writing

claude-haiku-3-5: $0.0004 input / $0.002 output per 1K tokens (standard), batch cuts this to $0.0002 / $0.001
claude-sonnet-4-5: $0.003 input / $0.015 output per 1K tokens (standard), batch cuts this to $0.0015 / $0.0075
claude-opus-4: $0.015 input / $0.075 output (standard), batch at $0.0075 / $0.0375

Run 10,000 documents through Haiku for extraction — say 500 tokens in, 200 tokens out per doc — and you’re looking at roughly $6 standard vs $3 batch. Swap that for Sonnet and it’s $45 vs $22.50. The savings compound quickly on real workloads.

Structuring Your Batch Job: The Request Format

Each batch is a list of request objects. Every object needs a unique custom_id (your identifier for matching results later), plus a standard messages payload. Here’s the core structure:

import anthropic
import json
from pathlib import Path

client = anthropic.Anthropic(api_key="your-api-key")

def build_batch_requests(documents: list[dict]) -> list[dict]:
    """
    documents: list of {"id": str, "text": str}
    Returns batch-formatted request list
    """
    requests = []
    for doc in documents:
        requests.append({
            "custom_id": doc["id"],  # must be unique within the batch
            "params": {
                "model": "claude-haiku-3-5",
                "max_tokens": 300,
                "messages": [
                    {
                        "role": "user",
                        "content": f"""Extract the following from this document and return as JSON:
- main_topic (string)
- sentiment (positive/negative/neutral)
- key_entities (list of up to 5 named entities)

Document:
{doc["text"]}

Return only valid JSON, no explanation."""
                    }
                ]
            }
        })
    return requests

A few things worth noting: custom_id can be up to 64 characters and only alphanumeric plus hyphens and underscores. If you use database IDs, you might need to sanitize them. The params object accepts the same fields as a standard messages API call — system prompts, temperature, tool use — everything works.

Submitting and Monitoring Your Batch

Submission is a single API call. The tricky part is what comes after: you need to poll for completion, handle partial failures, and parse the results file correctly.

def submit_batch(requests: list[dict]) -> str:
    """Submit batch and return batch ID"""
    batch = client.messages.batches.create(requests=requests)
    print(f"Batch submitted: {batch.id}")
    print(f"Status: {batch.processing_status}")
    return batch.id


def poll_until_complete(batch_id: str, poll_interval: int = 60) -> dict:
    """
    Poll batch status until complete.
    poll_interval: seconds between checks (60s is reasonable; don't hammer the API)
    """
    import time
    
    while True:
        batch = client.messages.batches.retrieve(batch_id)
        status = batch.processing_status
        
        counts = batch.request_counts
        print(
            f"Status: {status} | "
            f"Processing: {counts.processing} | "
            f"Succeeded: {counts.succeeded} | "
            f"Errored: {counts.errored}"
        )
        
        if status == "ended":
            return batch
        
        time.sleep(poll_interval)

The processing_status field moves through in_progress → ended. There’s no “completed successfully” vs “completed with errors” distinction at the top level — you get ended either way, and the error detail lives in the individual results. This catches people out the first time.

Parsing results and handling failures

Results stream back as JSONL. Each line is one result object containing your custom_id and either a successful response or an error. Always process the stream — don’t try to load the entire results payload into memory for large batches.

def process_batch_results(batch_id: str) -> tuple[list, list]:
    """
    Returns (successes, failures)
    successes: list of {"id": str, "result": dict}
    failures: list of {"id": str, "error": str}
    """
    successes = []
    failures = []
    
    # Stream results — more memory-efficient than loading all at once
    for result in client.messages.batches.results(batch_id):
        custom_id = result.custom_id
        
        if result.result.type == "succeeded":
            # Extract text content from the response
            content = result.result.message.content[0].text
            try:
                parsed = json.loads(content)
                successes.append({"id": custom_id, "result": parsed})
            except json.JSONDecodeError:
                # Model returned something that isn't valid JSON
                # Log it but don't crash the whole result set
                failures.append({
                    "id": custom_id, 
                    "error": f"JSON parse error: {content[:200]}"
                })
        
        elif result.result.type == "errored":
            error = result.result.error
            failures.append({
                "id": custom_id,
                "error": f"{error.type}: {error.message}"
            })
    
    return successes, failures

The most common error type you’ll see is overloaded_error — the model was busy at processing time. Anthropic’s docs say they retry these internally to some extent, but in practice you’ll still get a small failure rate on large batches. Build retry logic: collect the failed IDs, rebuild those as a new batch, resubmit. For a 10K document run I’ve seen failure rates of 0.1–0.5%, so plan for it but don’t over-engineer it.

Putting It All Together: End-to-End Pipeline

Here’s a complete wrapper that handles the full cycle for a document processing job:

def run_batch_pipeline(
    documents: list[dict],
    batch_size: int = 10_000,
    output_path: str = "results.jsonl"
) -> dict:
    """
    Full batch pipeline with automatic chunking for >10K documents.
    
    documents: list of {"id": str, "text": str}
    batch_size: max requests per batch (API limit is 10,000)
    output_path: where to write results as JSONL
    
    Returns summary dict with counts and cost estimate
    """
    import time
    
    all_successes = []
    all_failures = []
    batch_ids = []
    
    # Chunk documents into batches if over the 10K limit
    chunks = [documents[i:i+batch_size] for i in range(0, len(documents), batch_size)]
    print(f"Processing {len(documents)} documents in {len(chunks)} batch(es)")
    
    # Submit all batches first — don't wait for each one sequentially
    for chunk in chunks:
        requests = build_batch_requests(chunk)
        batch_id = submit_batch(requests)
        batch_ids.append(batch_id)
        time.sleep(2)  # small gap between submissions to avoid rate limit on batch creation
    
    # Now poll all batches to completion
    for batch_id in batch_ids:
        print(f"\nWaiting on batch {batch_id}...")
        poll_until_complete(batch_id)
        successes, failures = process_batch_results(batch_id)
        all_successes.extend(successes)
        all_failures.extend(failures)
    
    # Write results to JSONL
    with open(output_path, "w") as f:
        for item in all_successes:
            f.write(json.dumps(item) + "\n")
    
    # Rough cost estimate for Haiku (500 input tokens, 200 output per doc)
    total_docs = len(all_successes)
    est_cost = total_docs * ((500 * 0.0000002) + (200 * 0.000001))
    
    summary = {
        "total_submitted": len(documents),
        "succeeded": len(all_successes),
        "failed": len(all_failures),
        "failure_rate": f"{len(all_failures)/len(documents)*100:.1f}%",
        "estimated_cost_usd": round(est_cost, 4),
        "failures": all_failures[:10]  # first 10 for inspection
    }
    
    print(f"\nDone. {summary}")
    return summary

Submit all your batches first, then poll them in parallel. Don’t submit-wait-submit-wait — that’s leaving throughput on the table. Multiple concurrent batches process simultaneously and you can have up to 100 unfinished batches at a time per API documentation.

What Breaks in Production (Honest Assessment)

JSON output reliability: Even with explicit instructions to return JSON, Haiku will occasionally add preamble like “Here is the extracted JSON:” before the actual object. Add a post-processing step that strips everything before the first { character. For critical pipelines, use claude-sonnet-4-5 instead — it’s significantly more reliable at structured output, and the batch discount still makes it affordable.

Token estimation: You can’t get a cost estimate before submitting. Build your own token counter using the tiktoken library or the Anthropic tokenizer endpoint before committing to a large run. Discovering your documents average 3,000 tokens instead of 500 changes your cost by 6x.

Batch expiry: Batches expire 29 days after creation. If you’re building long-running pipelines, make sure you’re pulling results before they disappear. Store the batch ID and submission timestamp persistently — don’t rely on in-memory state.

No streaming: Results only become available after the batch fully ends. You can’t peek at partial results mid-run. For 24-hour jobs this is fine; for 4-hour jobs it’s a workflow consideration.

Rate limits on batch creation: You’re still subject to API rate limits on the batch creation calls themselves. Space your batch submissions 2–5 seconds apart if submitting many batches in quick succession.

When to Use Batch Processing vs Standard API

Use the batch API when: you have 500+ documents to process, results can wait hours, you’re running scheduled jobs (nightly classification, weekly report generation), or you’re doing initial data processing on a static corpus.

Stick with standard synchronous API when: you need responses in under 30 seconds, you’re serving user-facing features, you need to chain outputs immediately into subsequent calls, or you’re running fewer than a few hundred requests where the operational overhead isn’t worth it.

The sweet spot for Claude batch API processing is any ETL-style workload: pulling documents from S3 or a database, enriching them with AI-extracted metadata, writing back to a data warehouse. This pattern runs overnight, costs half as much, and requires zero infrastructure beyond a simple polling script.

Choosing the Right Model for Batch Jobs

For extraction and classification: Haiku is your default. It’s fast, cheap, and handles well-structured prompts reliably. At batch pricing, 10K documents with 500-token average input costs about $1–3 depending on output length.

For summarisation or analysis requiring reasoning: Sonnet at batch pricing is genuinely compelling — half of what you’d pay for real-time Sonnet, and the quality gap over Haiku is real for nuanced tasks. I’d use Sonnet for anything that needs multi-step reasoning, code analysis, or long-document summarisation.

For the highest-stakes content where quality is non-negotiable: Opus batch pricing makes large-scale Opus usage feasible for the first time. If you’re processing legal documents, medical records, or financial filings where errors are costly, the 50% batch discount on Opus brings it into budget for many use cases.

Bottom line for different reader types: If you’re a solo founder with a document-heavy product — contract analysis, invoice processing, content moderation — the batch API is the most immediate infrastructure win available. You can cut your AI costs in half overnight with 2–3 hours of implementation work. If you’re on a team running production pipelines, wrap this in a proper job queue (SQS, Redis, whatever you already use), add dead-letter handling for the failed IDs, and you have an enterprise-grade document processing pipeline for almost nothing. The code above is production-ready with minor additions; don’t let anyone sell you a complex architecture when a polling script and a JSONL file will handle 99% of batch workloads.

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Building Batch Processing Workflows with Claude API: Handle 10K+ Documents Efficiently

Context Window Comparison 2025: Claude 200K vs GPT-4 Turbo vs Gemini 2 Million Tokens

Activepieces vs n8n vs Zapier: Building AI Automation Workflows Compared

Mistral Large vs Claude 3.5 Sonnet: Summarization and Compression Benchmark

Role Prompting vs Chain-of-Thought vs Constitutional AI: Best Prompt Technique for Agents

Claude Haiku vs GPT-4o Mini: Small Model Showdown for Cost-Conscious Agents

Helicone vs LangSmith vs Langfuse: LLM Observability Platform Comparison

Building Batch Processing Workflows with Claude API: Handle 10K+ Documents Efficiently

What the Batch API Actually Does (and Doesn’t Do)

Current pricing at time of writing

Structuring Your Batch Job: The Request Format

Submitting and Monitoring Your Batch

Parsing results and handling failures

Putting It All Together: End-to-End Pipeline

What Breaks in Production (Honest Assessment)

When to Use Batch Processing vs Standard API

Choosing the Right Model for Batch Jobs

Related Posts

Context Window Comparison 2025: Claude 200K vs GPT-4 Turbo vs Gemini 2 Million Tokens

Activepieces vs n8n vs Zapier: Building AI Automation Workflows Compared

Mistral Large vs Claude 3.5 Sonnet: Summarization and Compression Benchmark

Role Prompting vs Chain-of-Thought vs Constitutional AI: Best Prompt Technique for Agents

Claude Haiku vs GPT-4o Mini: Small Model Showdown for Cost-Conscious Agents

Helicone vs LangSmith vs Langfuse: LLM Observability Platform Comparison