By the end of this tutorial, you’ll have a working Python script that scrapes competitor websites on a schedule, detects meaningful changes in pricing pages and job postings, and uses Claude to generate actionable intelligence summaries — delivered to your Slack or email inbox automatically. This is exactly the kind of AI competitor monitoring automation that used to require a full-time analyst.
- Install dependencies — set up requests, BeautifulSoup, and the Anthropic SDK
- Build the scraper — fetch and hash competitor pages to detect changes
- Extract structured signals — parse pricing, job postings, and feature announcements
- Analyze with Claude — generate intelligence summaries with competitive context
- Store and diff snapshots — SQLite-backed change detection with timestamps
- Schedule and alert — run on a cron schedule and push digests to Slack
Why Most Competitor Monitoring Falls Apart
Tools like Visualping or Distill watch for visual changes, but they tell you that something changed, not what it means. Did that pricing page rewrite signal a package consolidation or a price increase? Did three new backend engineering job postings mean they’re building a new product vertical or replacing departed engineers? You need an LLM in the loop to answer those questions, and you need it running automatically.
The architecture here is deliberately simple: scrape → diff → analyze → alert. No vector database, no complex orchestration. It runs in under 30 seconds per competitor and costs roughly $0.003–0.008 per analysis run using Claude Haiku 3.5 (depending on page length). For 10 competitors checked daily, that’s under $1/month in API costs.
Step 1: Install Dependencies
pip install anthropic requests beautifulsoup4 lxml python-dotenv schedule slack-sdk
You’ll also need a SQLite database (stdlib, no install needed) for storing snapshots. If you’re planning to run this at scale across hundreds of URLs, check out the batch processing patterns with the Claude API — the same diffing logic applies but you’d queue jobs rather than run them synchronously.
Step 2: Build the Scraper and Change Detector
The core problem with web scraping for change detection is noise. Navigation menus, footer dates, ad slots — they all change constantly and are meaningless. We extract only the main content block and hash it.
import hashlib
import sqlite3
import requests
from bs4 import BeautifulSoup
from datetime import datetime
DB_PATH = "competitor_snapshots.db"
def init_db():
conn = sqlite3.connect(DB_PATH)
conn.execute("""
CREATE TABLE IF NOT EXISTS snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
url TEXT NOT NULL,
content_hash TEXT NOT NULL,
raw_text TEXT NOT NULL,
captured_at TEXT NOT NULL
)
""")
conn.commit()
conn.close()
def fetch_page_text(url: str) -> str:
"""Fetch URL and return cleaned body text, stripping nav/footer noise."""
headers = {"User-Agent": "Mozilla/5.0 (compatible; CompetitorBot/1.0)"}
resp = requests.get(url, headers=headers, timeout=15)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "lxml")
# Remove noisy elements
for tag in soup(["nav", "footer", "script", "style", "header"]):
tag.decompose()
# Try to get main content first, fall back to body
main = soup.find("main") or soup.find("article") or soup.body
return main.get_text(separator=" ", strip=True) if main else ""
def content_hash(text: str) -> str:
return hashlib.sha256(text.encode()).hexdigest()
def get_last_snapshot(url: str) -> dict | None:
conn = sqlite3.connect(DB_PATH)
row = conn.execute(
"SELECT content_hash, raw_text, captured_at FROM snapshots "
"WHERE url = ? ORDER BY captured_at DESC LIMIT 1",
(url,)
).fetchone()
conn.close()
if row:
return {"hash": row[0], "text": row[1], "captured_at": row[2]}
return None
def save_snapshot(url: str, text: str, hash_val: str):
conn = sqlite3.connect(DB_PATH)
conn.execute(
"INSERT INTO snapshots (url, content_hash, raw_text, captured_at) VALUES (?, ?, ?, ?)",
(url, hash_val, text, datetime.utcnow().isoformat())
)
conn.commit()
conn.close()
Step 3: Extract Structured Signals from Pricing and Job Pages
Generic change detection is useful, but the real value comes from watching specific signals. Pricing pages and job postings are the two highest-signal sources for competitive intelligence.
def extract_pricing_signals(text: str) -> str:
"""Pull out lines likely containing price info."""
import re
lines = text.split(". ")
price_pattern = re.compile(r'\$[\d,]+|per month|per year|annually|free tier|enterprise|seats', re.I)
signals = [line.strip() for line in lines if price_pattern.search(line)]
return "\n".join(signals[:30]) # Cap at 30 lines to control token usage
def extract_job_signals(text: str) -> str:
"""Pull out job title patterns and hiring signals."""
import re
lines = text.split("\n")
job_pattern = re.compile(
r'engineer|developer|product manager|designer|sales|marketing|director|vp of|head of',
re.I
)
signals = [line.strip() for line in lines if job_pattern.search(line) and len(line) > 10]
return "\n".join(signals[:40])
Step 4: Analyze Changes with Claude
This is where the system goes from “change detector” to actual intelligence. We send both the old and new content to Claude with a structured prompt that asks for specific competitive insights, not just a summary.
import anthropic
import os
from dotenv import load_dotenv
load_dotenv()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
SYSTEM_PROMPT = """You are a competitive intelligence analyst. When given a before/after diff
of a competitor's website, you identify:
1. Strategic signals (new products, pivots, packaging changes)
2. Pricing changes (increases, decreases, new tiers, removed plans)
3. Hiring signals (what new roles suggest about roadmap or team growth)
4. Threats (things that directly compete with our current offerings)
5. Opportunities (gaps they're not addressing)
Be specific. "They added a $29/mo starter plan" is useful. "Things changed" is not.
If the change is cosmetic or trivial (rewording, typo fixes), say so and don't oversell it.
Keep your response under 300 words."""
def analyze_change_with_claude(
url: str,
old_text: str,
new_text: str,
page_type: str = "general" # "pricing", "jobs", "homepage", "general"
) -> str:
# Truncate to control costs — Claude Haiku handles ~12k tokens comfortably
old_snippet = old_text[:3000]
new_snippet = new_text[:3000]
user_message = f"""Competitor page: {url}
Page type: {page_type}
BEFORE:
{old_snippet}
AFTER:
{new_snippet}
What changed and what does it mean competitively?"""
response = client.messages.create(
model="claude-haiku-4-5", # ~$0.00025/1k input tokens — very cheap for this
max_tokens=500,
system=SYSTEM_PROMPT,
messages=[{"role": "user", "content": user_message}]
)
return response.content[0].text
One thing worth noting: the prompt explicitly asks Claude to call out trivial changes. Without that instruction, Claude will hallucinate significance in noise. I learned this the hard way after getting a “major strategic shift” alert about a competitor changing their footer copyright year.
For prompting patterns that reliably produce structured, non-hallucinatory output, the guide on getting consistent JSON from Claude is worth reading — you can extend this to output structured JSON instead of prose if you’re feeding downstream systems.
Step 5: Store Snapshots and Orchestrate the Full Pipeline
COMPETITORS = [
{"url": "https://competitor-a.com/pricing", "type": "pricing"},
{"url": "https://competitor-a.com/careers", "type": "jobs"},
{"url": "https://competitor-b.com/pricing", "type": "pricing"},
{"url": "https://competitor-c.com/features", "type": "general"},
]
def check_competitor(entry: dict) -> dict | None:
"""
Returns a change report dict if something meaningful changed, else None.
"""
url = entry["url"]
page_type = entry["type"]
try:
new_text = fetch_page_text(url)
except Exception as e:
return {"url": url, "error": str(e), "analysis": None}
new_hash = content_hash(new_text)
last = get_last_snapshot(url)
# Always save the new snapshot
save_snapshot(url, new_text, new_hash)
# No previous snapshot — first run, just store
if last is None:
return None
# No change detected
if last["hash"] == new_hash:
return None
# Change detected — run Claude analysis
analysis = analyze_change_with_claude(url, last["text"], new_text, page_type)
return {
"url": url,
"page_type": page_type,
"last_checked": last["captured_at"],
"analysis": analysis
}
def run_monitoring_cycle():
init_db()
reports = []
for entry in COMPETITORS:
result = check_competitor(entry)
if result:
reports.append(result)
print(f"[CHANGE] {result['url']}")
return reports
Step 6: Schedule Runs and Send Slack Alerts
import json
from slack_sdk.webhook import WebhookClient
SLACK_WEBHOOK_URL = os.environ.get("SLACK_WEBHOOK_URL", "")
def send_slack_digest(reports: list[dict]):
if not reports or not SLACK_WEBHOOK_URL:
return
webhook = WebhookClient(SLACK_WEBHOOK_URL)
blocks = [
{
"type": "header",
"text": {"type": "plain_text", "text": f"🕵️ Competitor Intelligence — {datetime.utcnow().strftime('%Y-%m-%d')}"}
}
]
for r in reports:
if r.get("error"):
text = f"⚠️ Failed to fetch {r['url']}: {r['error']}"
else:
text = f"*<{r['url']}|{r['url']}>* ({r['page_type']})\n{r['analysis']}"
blocks.append({"type": "section", "text": {"type": "mrkdwn", "text": text}})
blocks.append({"type": "divider"})
webhook.send(blocks=blocks)
# Scheduling with cron is cleaner than the `schedule` library for production.
# See: https://www.unpromptedmind.com/cron-scheduled-claude-agents/
if __name__ == "__main__":
reports = run_monitoring_cycle()
send_slack_digest(reports)
print(f"Done. {len(reports)} changes detected.")
For production scheduling, I’d run this as a cron job rather than using the Python schedule library — the guide on scheduling Claude agents with cron on Linux covers the setup in detail, including retry logic and log rotation.
Common Errors and How to Fix Them
1. Hash churn from dynamic content
Some sites inject session tokens, timestamps, or A/B test IDs directly into the HTML body. Your hash will always differ even when nothing meaningful changed. Fix: after extracting with BeautifulSoup, strip any text matching patterns like UUIDs, session tokens, or timestamps before hashing. Alternatively, hash only the pricing table or specific CSS selectors using soup.select(".pricing-grid") instead of the full body.
2. Claude over-explaining trivial changes
Without tight instructions, Claude will find significance in a button text change from “Get Started” to “Start Free Trial.” The fix is in the system prompt — explicitly instruct it to rate change significance (1–5) and only elaborate on changes rated 3+. You can also filter Claude’s output before alerting: if the response contains phrases like “minor wording” or “cosmetic”, skip the Slack notification.
3. Request blocking (403s, CAPTCHAs)
Rotating user agents helps for basic blocks, but some sites use bot detection that catches even polished scrapers. For those, consider using a service like Browserless or Playwright with stealth mode. For sites that completely block scraping, job postings are often mirrored on LinkedIn or Greenhouse — scrape those instead. Don’t try to get clever with headers to bypass security measures on sites that clearly prohibit scraping; check the robots.txt and ToS first.
What to Build Next
The natural extension is adding a weekly narrative digest — instead of per-change alerts, run a second Claude call at the end of each week that synthesizes all the changes detected across all competitors into a single “what’s happening in the market” summary. Feed it the last 7 days of change reports and ask it to identify patterns: “Is competitor X consistently hiring for AI roles?” or “Have three competitors all raised prices this month?” This is closer to actual analyst output than individual change alerts.
If you want to add lead generation signals to this workflow — like monitoring competitor customers who might be receptive to switching — the AI lead generation email agent tutorial covers how to combine intelligence signals with outreach automation.
Bottom Line: Who Should Build This
Solo founders: Run this on a $5 VPS with a SQLite backend. Total cost is under $3/month for 15 competitors checked daily. The Slack alert alone is worth it.
Small teams: Add a lightweight web UI (Flask or FastAPI) so non-technical team members can browse the change history. Swap SQLite for Postgres if you’re storing more than 90 days of snapshots.
Enterprise or agencies: The architecture here is solid but you’ll want to add proper authentication, multi-tenant support if managing clients, and more robust scraping infrastructure. Consider Playwright for JS-heavy sites and a queue system like Redis for managing concurrent checks at scale.
This AI competitor monitoring automation approach beats any SaaS tool I’ve seen in the $50–200/month range because it actually tells you what the changes mean, not just that pixels moved. The Claude analysis layer is the entire value proposition — everything else is plumbing.
Frequently Asked Questions
How often should I scrape competitor websites to avoid getting blocked?
Once per day is safe for most public-facing pages — it’s consistent with normal browsing behavior. Checking every hour is where you’ll start triggering rate limits or bot detection. For high-value pages like pricing, daily checks are sufficient since most companies don’t change pricing more than a few times per year. If you need higher frequency, add jitter: randomize the check time ±2 hours rather than running at exactly the same time every day.
What’s the difference between using Claude Haiku vs Sonnet for competitor analysis?
Haiku 3.5 handles this task well for 95% of use cases — it’s fast, cheap (~$0.00025/1k input tokens), and produces clear, structured competitive summaries. Sonnet makes sense if you’re feeding it longer documents (full annual reports, lengthy product pages) or need more nuanced reasoning about strategic implications. For the page-diff-and-summarize pattern described here, Haiku gives you 80% of Sonnet’s quality at about 15% of the cost.
Can I use this to monitor competitor pricing changes automatically?
Yes, and pricing pages are the highest-signal target for this system. The key is the structured extraction step — pulling lines containing price patterns before sending to Claude, which reduces token usage and focuses the analysis. One caveat: many SaaS companies hide pricing behind “Contact Sales” for enterprise tiers, so you’ll only catch what’s publicly listed. Combine this with monitoring their G2 or Capterra reviews, where users sometimes mention what they’re paying.
How do I monitor competitor job postings at scale?
The most reliable approach is scraping their dedicated careers page (usually /careers or /jobs) plus checking their Greenhouse, Lever, or Workday job board subdomain if they use one. Job board pages are generally more bot-tolerant than marketing pages. For a company with 10+ competitors, building a map of which ATS each uses (Greenhouse, Lever, Ashby) and scraping those directly gives you cleaner, more structured data than trying to parse bespoke careers pages.
Is scraping competitors’ websites legal?
Scraping publicly available information is generally legal in most jurisdictions, but always check the site’s robots.txt and Terms of Service before scraping. The 2022 hiQ v. LinkedIn ruling in the US affirmed that scraping public data generally doesn’t violate the Computer Fraud and Abuse Act, but individual ToS agreements and GDPR in Europe add complexity. Don’t scrape behind login walls, don’t circumvent technical access controls, and respect crawl-delay directives. When in doubt, consult a lawyer for your specific use case.
Put this into practice
Try the Monitoring Specialist agent — ready to use, no setup required.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

