Building a Claude Email Agent: Read, Summarize, and Respond Automatically

Email is where productivity goes to die. If you’re running a business or managing a product, you know the drill: 200 unread messages, half of them noise, a handful that actually need a thoughtful reply. A Claude email agent can cut through that — not by auto-sending replies (please don’t do that in production on day one), but by reading your inbox, understanding what each email actually needs, and drafting responses you can review and send with one click. That’s the thing you’ll be able to build by the end of this article.

We’re going to wire up the Gmail API to pull emails, use Claude to classify and summarize them, then generate draft replies that land in your Drafts folder ready to go. Full working code, real error handling, honest notes on where this breaks.

What We’re Actually Building

The agent does three things in sequence:

Fetch unread emails from Gmail using the Gmail API
Triage them — Claude reads subject + body and assigns a category and urgency score
Draft a reply for emails that need one, and push it back to Gmail as a draft

We’re not using LangChain or any agent framework here. The “agent” pattern here is simple: a structured prompt loop with tool calls. Claude’s tool use API handles the decision logic. I’d actually recommend starting without a framework for anything email-related — you want tight control over what gets sent, and framework abstractions tend to hide that.

Stack: Python 3.11+, anthropic SDK, google-api-python-client, and google-auth-oauthlib. No LangChain, no vector database needed.

Gmail API Setup: The Part the Docs Get Wrong

The Gmail API docs will tell you to create a project, enable the API, and download credentials.json. What they don’t tell you clearly: if you’re running this as a personal tool (not a published app), you need to add your Gmail address as a test user in the OAuth consent screen, or the auth flow will fail with a cryptic access_denied error after 100 successful runs.

Also: the default OAuth scope you’ll see in examples is gmail.readonly. You need gmail.modify to create drafts. Use the minimal scope that covers what you need.

from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.discovery import build
import os
import pickle

SCOPES = [
    "https://www.googleapis.com/auth/gmail.readonly",
    "https://www.googleapis.com/auth/gmail.compose"  # needed for creating drafts
]

def get_gmail_service():
    creds = None
    # token.pickle stores the user's access/refresh tokens
    if os.path.exists("token.pickle"):
        with open("token.pickle", "rb") as token:
            creds = pickle.load(token)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                "credentials.json", SCOPES
            )
            creds = flow.run_local_server(port=0)
        with open("token.pickle", "wb") as token:
            pickle.dump(creds, token)

    return build("gmail", "v1", credentials=creds)

Run this once interactively to get your token.pickle. After that, the token auto-refreshes. Don’t commit either file to git.

Fetching and Parsing Emails

Gmail’s API returns emails in a nested MIME structure. The part that trips people up is that the body is base64url-encoded, and multipart messages need recursive traversal to get the plain text content.

import base64
from email import message_from_bytes

def fetch_unread_emails(service, max_results=10):
    results = service.users().messages().list(
        userId="me",
        q="is:unread",
        maxResults=max_results
    ).execute()

    messages = results.get("messages", [])
    emails = []

    for msg in messages:
        full_msg = service.users().messages().get(
            userId="me",
            id=msg["id"],
            format="full"
        ).execute()
        emails.append(parse_email(full_msg))

    return emails

def parse_email(msg):
    headers = {h["name"]: h["value"] for h in msg["payload"]["headers"]}
    body = extract_body(msg["payload"])

    return {
        "id": msg["id"],
        "thread_id": msg["threadId"],
        "subject": headers.get("Subject", "(no subject)"),
        "from": headers.get("From", ""),
        "to": headers.get("To", ""),
        "date": headers.get("Date", ""),
        "body": body[:3000]  # truncate to avoid token blow-out on long newsletters
    }

def extract_body(payload):
    """Recursively extract plain text from MIME parts."""
    if payload.get("mimeType") == "text/plain":
        data = payload.get("body", {}).get("data", "")
        return base64.urlsafe_b64decode(data + "==").decode("utf-8", errors="replace")

    for part in payload.get("parts", []):
        result = extract_body(part)
        if result:
            return result

    return ""

The + "==" on the base64 decode is not a typo — base64url strings from Gmail are often missing padding, and Python’s decoder is strict about it.

The Triage and Drafting Logic with Claude

Here’s where the actual intelligence lives. We’re using Claude to do two things in one call: classify the email and draft a reply if needed. Doing this in a single structured prompt is cheaper and faster than two separate API calls.

I’m using claude-haiku-3-5 for triage (it’s fast and costs roughly $0.001–0.002 per email at current pricing) and claude-sonnet-4 for drafting replies when the email requires something substantive. You could use Haiku for everything and save money — the quality difference only matters for nuanced or high-stakes replies.

import anthropic
import json

client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY from env

TRIAGE_PROMPT = """You are an email triage assistant. Analyze this email and respond with a JSON object containing:
- category: one of [support, sales, newsletter, internal, urgent, spam, other]
- urgency: integer 1-5 (5 = needs reply today)
- needs_reply: boolean
- summary: one sentence max
- suggested_action: brief string

Email:
From: {from_addr}
Subject: {subject}
Body: {body}

Respond with valid JSON only."""

DRAFT_PROMPT = """You are a professional email assistant writing on behalf of the recipient.
Write a reply to the following email. Be concise, helpful, and match the tone of the original.
Do not include a subject line. Do not add sign-off placeholder text — end naturally.

Original email:
From: {from_addr}
Subject: {subject}
Body: {body}

Context about the recipient: {context}

Write the reply body only."""

def triage_email(email: dict) -> dict:
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": TRIAGE_PROMPT.format(
                from_addr=email["from"],
                subject=email["subject"],
                body=email["body"]
            )
        }]
    )
    try:
        return json.loads(response.content[0].text)
    except json.JSONDecodeError:
        # Claude occasionally adds a markdown code fence — strip it
        raw = response.content[0].text.strip().strip("```json").strip("```")
        return json.loads(raw)

def draft_reply(email: dict, context: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=512,
        messages=[{
            "role": "user",
            "content": DRAFT_PROMPT.format(
                from_addr=email["from"],
                subject=email["subject"],
                body=email["body"],
                context=context
            )
        }]
    )
    return response.content[0].text.strip()

Pushing Drafts Back to Gmail

Creating a draft via the Gmail API requires encoding the full RFC 2822 message as base64url. The email standard library handles the formatting; the tricky bit is threading — if you want your draft to show up in the right conversation, you need to pass the correct threadId.

from email.mime.text import MIMEText

def create_draft(service, to: str, subject: str, body: str, thread_id: str) -> str:
    message = MIMEText(body)
    message["to"] = to
    message["subject"] = f"Re: {subject}" if not subject.startswith("Re:") else subject

    raw = base64.urlsafe_b64encode(message.as_bytes()).decode("utf-8")

    draft = service.users().drafts().create(
        userId="me",
        body={
            "message": {
                "raw": raw,
                "threadId": thread_id  # keeps it in the same conversation
            }
        }
    ).execute()

    return draft["id"]

Wiring It All Together

The main loop is straightforward. The context string in draft_reply is where you inject persona information — who you are, your role, standard policies, anything Claude should know when writing as you.

MY_CONTEXT = """
I'm Alex, founder of a B2B SaaS tool for logistics companies.
Standard reply time: 24 hours. We offer a 14-day free trial.
For support issues, always ask for their account email and error message first.
"""

def run_agent():
    service = get_gmail_service()
    emails = fetch_unread_emails(service, max_results=15)

    for email in emails:
        print(f"Processing: {email['subject'][:60]}")

        triage = triage_email(email)
        print(f"  → {triage['category']} | urgency {triage['urgency']} | needs_reply: {triage['needs_reply']}")
        print(f"  → {triage['summary']}")

        if triage["needs_reply"] and triage["urgency"] >= 2:
            draft_body = draft_reply(email, MY_CONTEXT)
            draft_id = create_draft(
                service,
                to=email["from"],
                subject=email["subject"],
                body=draft_body,
                thread_id=email["thread_id"]
            )
            print(f"  → Draft created: {draft_id}")

if __name__ == "__main__":
    run_agent()

What Breaks in Production

Rate limits. Gmail API has a quota of 250 units per second (listing messages costs 5 units, getting a full message costs 5 units). With 15 emails you’re fine. At 100+ you’ll hit quota errors — add exponential backoff with tenacity.

Long emails. We truncate to 3000 characters. That breaks for multi-page contracts or long threads. A better approach: summarize the thread first with a cheap Haiku call, then use the summary as context for drafting.

JSON parsing failures. Claude Haiku occasionally wraps JSON in a markdown code fence despite being told not to. The strip trick in triage_email handles most cases, but add a fallback that returns a safe default dict rather than crashing.

Hallucinated context. If you don’t give Claude enough context about who you are, it will invent plausible-sounding but wrong details in drafts. Your MY_CONTEXT string is load-bearing — keep it updated.

Duplicate drafts. If you run the agent twice without marking emails as read or tracking processed IDs, you’ll create duplicate drafts. Add a simple processed-IDs set persisted to a local JSON file, or mark emails as read after triaging.

Extending This: Scheduling and Filtering

For a production setup, I’d run this on a cron job every 30 minutes using a simple systemd timer or GitHub Actions scheduled workflow. Add a filter in the Gmail query string — q="is:unread -from:noreply -from:no-reply" — to skip transactional noise before it even hits Claude, which cuts your API costs significantly.

If you want to add a human-in-the-loop approval step, the cleanest approach is to push candidate drafts to a Notion database or a Slack message with approve/reject buttons, rather than directly to Gmail Drafts. That gives you a review queue without building a full UI.

Who Should Build This and How Far to Take It

Solo founders and consultants get the most value here. If you’re handling sales, support, and operations yourself, a Claude email agent that pre-drafts your replies saves 30–60 minutes daily. Keep the urgency threshold at 3+ for drafting so you’re only reviewing replies that actually matter.

Small teams should add a shared context file per team member and route emails by category — support tickets go to one queue, sales leads to another. This is two extra lines of logic in the main loop.

Don’t auto-send. I mean it. Even with good prompts, Claude will occasionally draft a reply that’s technically correct but tonally wrong for a specific relationship. The value of this system is speed of review, not elimination of review. The draft workflow is the right production pattern.

At current pricing, running this agent on 50 emails a day costs under $0.10/day using Haiku for triage and Sonnet for drafting. That’s the economics of a Claude email agent that actually fits into a real workflow — cheap enough to run continuously, good enough to trust as a first draft.

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Building a Claude Email Agent: Read, Summarize, and Respond Automatically

Context Window Comparison 2025: Claude 200K vs GPT-4 Turbo vs Gemini 2 Million Tokens

Activepieces vs n8n vs Zapier: Building AI Automation Workflows Compared

Mistral Large vs Claude 3.5 Sonnet: Summarization and Compression Benchmark

Role Prompting vs Chain-of-Thought vs Constitutional AI: Best Prompt Technique for Agents

Claude Haiku vs GPT-4o Mini: Small Model Showdown for Cost-Conscious Agents

Helicone vs LangSmith vs Langfuse: LLM Observability Platform Comparison

Building a Claude Email Agent: Read, Summarize, and Respond Automatically

What We’re Actually Building

Gmail API Setup: The Part the Docs Get Wrong

Fetching and Parsing Emails

The Triage and Drafting Logic with Claude

Pushing Drafts Back to Gmail

Wiring It All Together

What Breaks in Production

Extending This: Scheduling and Filtering

Who Should Build This and How Far to Take It

Related Posts

Context Window Comparison 2025: Claude 200K vs GPT-4 Turbo vs Gemini 2 Million Tokens

Activepieces vs n8n vs Zapier: Building AI Automation Workflows Compared

Mistral Large vs Claude 3.5 Sonnet: Summarization and Compression Benchmark

Role Prompting vs Chain-of-Thought vs Constitutional AI: Best Prompt Technique for Agents

Claude Haiku vs GPT-4o Mini: Small Model Showdown for Cost-Conscious Agents

Helicone vs LangSmith vs Langfuse: LLM Observability Platform Comparison