Sunday, April 5

Most developers I talk to are running Claude one-off — paste some text, get a response, done. But the moment you need to process 500 customer records every night, generate daily reports at 6am, or batch-classify incoming data while you sleep, you need something more disciplined. Scheduling AI workflows with cron and a properly structured Claude agent is one of those setups that takes an afternoon to get right and then just runs — reliably, cheaply, and without babysitting.

This article covers exactly that: wiring Claude agents into Linux cron jobs for batch processing, handling state so reruns don’t double-process records, and getting failure notifications before your users do. All code is tested and production-ready (or close enough that the gaps are clearly flagged).

Why Cron Still Makes Sense for AI Batch Jobs

You could reach for Airflow, Prefect, or a cloud scheduler. Sometimes that’s the right call. But for most teams running Claude batch jobs — daily reports, weekly summaries, nightly data enrichment — cron on a $20/month VPS is perfectly sufficient and dramatically simpler to operate.

Cron gives you zero infrastructure overhead, no dashboard to maintain, and no third-party dependency in your execution path. The failure modes are also well-understood: the job either runs or it doesn’t, and you can check /var/log/syslog immediately. Compare that to debugging a flaky Prefect agent at 2am.

The one thing cron genuinely doesn’t give you is built-in retries, dependency chaining, or per-task status visibility. If you need those, step up to a proper scheduler. For linear batch jobs against the Claude API, cron is fine.

Project Structure for a Schedulable Claude Agent

Before writing a single line of cron syntax, structure your Python project so it behaves correctly when run non-interactively. A few things matter immediately:

  • Absolute paths everywhere — cron doesn’t inherit your shell’s PATH or working directory
  • A virtualenv the cron job explicitly activates
  • A state file or database to track what’s already been processed
  • Logging to a file, not stdout (cron will email stdout to root, which no one reads)
  • Exit codes that mean something — 0 for success, non-zero for failure

Here’s a minimal but production-honest directory layout:

/opt/claude-batch/
├── venv/
├── jobs/
│   ├── daily_report.py
│   ├── data_enrichment.py
│   └── bulk_classifier.py
├── state/
│   └── processed_ids.json
├── logs/
│   └── daily_report.log
├── .env
└── run_job.sh          # wrapper that handles venv + env vars

The wrapper script is what cron actually calls. It handles activation and environment loading so your Python scripts stay clean:

#!/bin/bash
# run_job.sh — cron-safe wrapper for Claude batch jobs
set -euo pipefail

JOB_NAME="$1"
PROJECT_DIR="/opt/claude-batch"
LOG_FILE="$PROJECT_DIR/logs/${JOB_NAME}.log"

# Rotate log if it exceeds 50MB
if [ -f "$LOG_FILE" ] && [ $(stat -c%s "$LOG_FILE") -gt 52428800 ]; then
  mv "$LOG_FILE" "${LOG_FILE}.$(date +%Y%m%d)"
fi

# Activate virtualenv and load env vars
source "$PROJECT_DIR/venv/bin/activate"
set -a; source "$PROJECT_DIR/.env"; set +a

# Run the job, append output to log
python "$PROJECT_DIR/jobs/${JOB_NAME}.py" >> "$LOG_FILE" 2>&1
exit $?

A Real Claude Batch Job: Nightly Report Generation

Let’s build something concrete — a nightly job that pulls the previous day’s records from a SQLite database, sends them to Claude for summarisation, and writes the output to a markdown report.

State Management: Don’t Process Records Twice

The most common failure mode I see in scheduled AI jobs is reprocessing. If the job crashes halfway through and reruns, you’ve now billed for the same records twice and possibly written duplicate output. Use a state file or a processed flag in your database. Here’s the SQLite approach, which is simpler to reason about:

import sqlite3
import anthropic
import logging
import sys
from datetime import date, timedelta
from pathlib import Path

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
    stream=sys.stdout  # run_job.sh redirects this to the log file
)
log = logging.getLogger(__name__)

DB_PATH = Path("/opt/claude-batch/data/records.db")
REPORT_DIR = Path("/opt/claude-batch/reports")
REPORT_DIR.mkdir(parents=True, exist_ok=True)

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

def get_unprocessed_records(conn: sqlite3.Connection, target_date: date) -> list[dict]:
    """Fetch records for target_date that haven't been summarised yet."""
    cursor = conn.execute(
        """
        SELECT id, content, created_at
        FROM events
        WHERE DATE(created_at) = ?
          AND summarised = 0
        ORDER BY created_at
        """,
        (target_date.isoformat(),)
    )
    cols = [d[0] for d in cursor.description]
    return [dict(zip(cols, row)) for row in cursor.fetchall()]

def mark_processed(conn: sqlite3.Connection, record_ids: list[int]) -> None:
    conn.executemany(
        "UPDATE events SET summarised = 1 WHERE id = ?",
        [(rid,) for rid in record_ids]
    )
    conn.commit()

def summarise_batch(records: list[dict]) -> str:
    """Send a batch of records to Claude and return the summary."""
    # Build a structured prompt — don't just dump raw data
    records_text = "\n\n".join(
        f"[Record {r['id']} @ {r['created_at']}]\n{r['content']}"
        for r in records
    )

    message = client.messages.create(
        model="claude-haiku-4-5",  # Haiku at ~$0.0008/1K input tokens keeps costs sane
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": (
                f"Summarise the following {len(records)} records from {records[0]['created_at'][:10]}. "
                "Identify key themes, anomalies, and any action items. "
                "Output as structured markdown with sections.\n\n"
                f"{records_text}"
            )
        }]
    )
    return message.content[0].text

def run_nightly_report():
    target_date = date.today() - timedelta(days=1)
    log.info(f"Starting nightly report for {target_date}")

    with sqlite3.connect(DB_PATH) as conn:
        records = get_unprocessed_records(conn, target_date)

        if not records:
            log.info("No unprocessed records found — exiting cleanly")
            sys.exit(0)

        log.info(f"Processing {len(records)} records")

        # Process in batches of 50 to avoid hitting context limits
        # and to get partial results if something fails mid-run
        batch_size = 50
        summaries = []

        for i in range(0, len(records), batch_size):
            batch = records[i:i + batch_size]
            log.info(f"Summarising batch {i//batch_size + 1} ({len(batch)} records)")
            summary = summarise_batch(batch)
            summaries.append(summary)
            mark_processed(conn, [r["id"] for r in batch])  # commit after each batch

        # Write the final report
        report_path = REPORT_DIR / f"report_{target_date}.md"
        report_path.write_text(
            f"# Daily Report — {target_date}\n\n" + "\n\n---\n\n".join(summaries)
        )
        log.info(f"Report written to {report_path}")

if __name__ == "__main__":
    run_nightly_report()

A few things worth calling out: batching at 50 records keeps individual requests well under Claude’s context window, and committing state after each batch means a mid-run crash loses at most one batch of progress. At Haiku pricing (~$0.0008 per 1K input tokens), processing 500 records with ~200 tokens each costs roughly $0.08 per nightly run. That’s sustainable.

Wiring It Into Cron

Edit your crontab with crontab -e as the user who owns the project directory (not root unless you have a good reason):

# Run nightly report at 02:00 every day
# Failures will be captured in the log; email alerting handled separately
0 2 * * * /opt/claude-batch/run_job.sh daily_report

# Weekly bulk classifier, every Sunday at 03:30
30 3 * * 0 /opt/claude-batch/run_job.sh bulk_classifier

# Data enrichment every 6 hours
0 */6 * * * /opt/claude-batch/run_job.sh data_enrichment

One thing the documentation consistently undersells: cron runs with a minimal environment. Your ~/.bashrc aliases don’t exist, $HOME may differ, and Python’s ssl module will sometimes misbehave if system CA certs aren’t in the expected location. The wrapper script handles most of this, but if you hit SSL errors, check that your virtualenv’s Python can reach anthropic.com with python -c "import anthropic; anthropic.Anthropic()" from a plain shell (not your login shell).

Failure Notifications That Actually Work

Logging to a file is necessary but not sufficient — you need to know when a job fails before someone emails you asking why the report didn’t arrive. The simplest approach that doesn’t require another service: a small notification wrapper that calls a webhook on non-zero exit.

#!/bin/bash
# notify_on_failure.sh — wrap any job and ping a webhook if it fails
set -euo pipefail

JOB_NAME="$1"
WEBHOOK_URL="${SLACK_WEBHOOK_URL}"  # set in .env or environment

/opt/claude-batch/run_job.sh "$JOB_NAME"
EXIT_CODE=$?

if [ $EXIT_CODE -ne 0 ]; then
  HOSTNAME=$(hostname)
  TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
  curl -s -X POST "$WEBHOOK_URL" \
    -H "Content-Type: application/json" \
    -d "{\"text\": \"🔴 Claude batch job *${JOB_NAME}* failed on \`${HOSTNAME}\` at ${TIMESTAMP} (exit code ${EXIT_CODE}). Check logs at /opt/claude-batch/logs/${JOB_NAME}.log\"}"
fi

exit $EXIT_CODE

Then update your crontab to call this wrapper instead of run_job.sh directly. You get Slack pings on failures with no external monitoring service needed. If you’d rather use PagerDuty or email, swap the curl call — the pattern is the same.

Handling Claude API Rate Limits and Transient Failures

The Anthropic API will occasionally return 529 Overloaded or timeout. Your batch job needs to handle this gracefully rather than crashing the entire run. Add exponential backoff to your API calls:

import time
import anthropic
from anthropic import RateLimitError, APIStatusError

def summarise_with_retry(records: list[dict], max_retries: int = 3) -> str:
    """Call Claude with exponential backoff on rate limit or server errors."""
    for attempt in range(max_retries):
        try:
            return summarise_batch(records)
        except RateLimitError:
            wait = 2 ** attempt * 10  # 10s, 20s, 40s
            log.warning(f"Rate limited — waiting {wait}s before retry {attempt + 1}/{max_retries}")
            time.sleep(wait)
        except APIStatusError as e:
            if e.status_code >= 500:
                wait = 2 ** attempt * 5
                log.warning(f"Server error {e.status_code} — retrying in {wait}s")
                time.sleep(wait)
            else:
                raise  # 4xx errors are your fault, don't retry
    raise RuntimeError(f"Failed after {max_retries} retries")

Don’t retry on 400-series errors — those indicate a malformed request or auth issue and retrying just wastes quota. Do retry on 429 (rate limit) and 5xx (server-side problems).

When to Upgrade Beyond Cron

Cron stops being sufficient when you need:

  • Job dependencies — “run job B only after job A succeeds”
  • Dynamic parallelism — spawning N workers based on queue depth
  • Per-task status visibility — knowing which of 1,000 records failed individually
  • Cross-machine coordination — jobs distributed across multiple servers

For those requirements, look at Celery with Redis for Python-native task queues, or n8n if you want a visual workflow editor with built-in scheduling and error handling. n8n’s Claude integration is good enough that you can build the entire pipeline without writing Python — though you lose the fine-grained control over batching and retry logic.

Prefect and Dagster are solid if your team already uses them, but the operational overhead is real. I’ve watched teams spend more time maintaining their Prefect deployment than writing the actual jobs it runs. For scheduling AI workflows at the scale of “a few jobs per day,” that’s almost never worth it.

Bottom Line: Who Should Use This Setup

Solo founders and small teams — this is your stack. One VPS, cron, a SQLite state DB, and Slack webhooks for alerts. You can have a nightly Claude batch job running reliably in an afternoon, and the total infrastructure cost is zero beyond the VPS you probably already have.

Teams with existing infrastructure — if you already run Postgres and have a job runner, skip the SQLite state file and write your processed flags to your existing DB. The rest of the pattern holds.

High-volume or mission-critical jobs — step up to Celery or a managed queue. Cron’s lack of built-in retries and dead-letter queues will hurt you at scale. But get the cron version working first — it’ll clarify your actual requirements before you over-engineer.

Scheduling AI workflows doesn’t have to mean a complex orchestration platform. Cron, a well-structured Python script, and proper state management will handle the vast majority of batch AI use cases cleanly — and when you outgrow it, the migration path is straightforward because your job logic was always cleanly separated from the scheduler anyway.

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Share.
Leave A Reply