Build a Competitive Intelligence Agent with Browserbeam + GPT-5.4

By the end of this guide, you'll have a working competitive intelligence agent that monitors competitor websites, detects pricing and content changes, and sends you a summary every morning. We'll build it step by step with Python, Browserbeam, and GPT-5.4.

Most teams handle competitive intelligence the same way: someone opens a competitor's website, scrolls through the pricing page, and copies numbers into a spreadsheet. Maybe once a quarter. Maybe when a sales deal forces it. The problem isn't that people don't care about competitor data. The problem is that checking five websites manually every week takes an hour, and nobody has that hour.

We're going to automate the entire process. Our agent will visit competitor sites on a schedule, extract structured data (prices, features, blog posts), compare the current snapshot to the previous one, ask GPT-5.4 to flag what actually changed, and send a Slack message with the results. No manual checks. No stale spreadsheets.

What you'll build in this guide:

A price extraction module that pulls product names and prices from competitor sites
A content tracker that detects new blog posts and changelog entries
A GPT-5.4 analysis layer that compares snapshots and explains what changed
A diff detection system that separates real changes from noise
A Slack notification pipeline for daily or weekly alerts
A scheduling system that runs the whole pipeline automatically
Historical snapshot storage for trend analysis over time

TL;DR: Competitive intelligence automation combines Browserbeam (for extracting structured data from competitor websites), GPT-5.4 (for analyzing changes and generating summaries), and a simple scheduling pipeline. This guide walks through building each piece with working Python code against real sites. You'll have a complete CI agent in about 200 lines of code.

What Is Competitive Intelligence Automation?

Competitive intelligence (CI) automation replaces manual competitor research with software that monitors competitor websites, extracts data, and surfaces changes. Instead of assigning someone to check pricing pages every week, an automated CI agent does it on a schedule and tells you only when something changed.

Manual vs. Automated CI

Aspect	Manual CI	Automated CI Agent
Frequency	Quarterly or ad-hoc	Daily or hourly
Coverage	2-3 competitors, main pages only	10+ competitors, multiple page types
Consistency	Depends on who checks	Same extraction every run
Speed to insight	Hours to days	Minutes
Cost	Analyst time ($50-150/hr)	API costs ($5-20/month)
Historical data	Spreadsheet snapshots	Structured JSON with diffs
Change detection	"I think the price went up"	"Product X price changed from $29 to $39 on April 8"

The table tells the story. Manual CI is slow, expensive, and inconsistent. Automated CI is fast, cheap, and catches changes you'd miss.

Where Browser Automation Fits

Simple HTTP requests (like Python's requests library) work for static pages. But most modern pricing pages, feature comparison tables, and SaaS marketing sites render content with JavaScript. A raw GET request returns an empty shell.

Browser automation solves this by running a real browser that executes JavaScript, waits for content to load, and returns the fully rendered page. Browserbeam handles the browser infrastructure in the cloud, so we don't need to install or manage Chromium locally. We send API calls, and Browserbeam returns structured data.

For a deeper look at scraping methods and when to use HTTP vs. browser approaches, see the web scraping guide for 2026.

Architecture of a CI Agent

Before we start coding, let's look at how the pieces fit together.

Components Overview

Our CI agent has four layers:

Layer	Tool	Responsibility
Data Collection	Browserbeam Python SDK	Visit competitor sites, extract structured data
Analysis	OpenAI GPT-5.4	Compare snapshots, detect meaningful changes, generate summaries
Storage	JSON files (or a database)	Store historical snapshots for comparison
Alerting	Slack webhooks / SMTP email	Notify the team when something changes

Each layer is a Python module. We'll build them one at a time, then wire them together.

Data Flow: Websites to Insights

Scheduler (cron)

↓ triggers run

Browserbeam: Extract Data

↓ structured JSON

current snapshot GPT-5.4 Analysis

previous snapshot Diff Detection

↓ changes found?

Alert (Slack / Email)

↓ archive

Store Snapshot

The flow is linear: extract, compare, alert, store. Each run produces a snapshot that becomes the baseline for the next comparison.

Choosing What to Monitor

Not every page on a competitor's site is worth tracking. Focus on pages that change in ways your team can act on:

Pricing pages: Price increases, new tiers, removed features from plans
Feature comparison tables: New features, changed positioning
Blog and changelog: Product announcements, new integrations, direction signals
Job postings: Hiring patterns reveal strategic priorities (10 new ML engineers = they're building AI features)
Landing pages: Messaging changes, new target audiences

Start with pricing and blog/changelog. These are the highest signal-to-noise ratio pages.

Setting Up the Project

Dependencies and API Keys

We need two Python packages:

pip install browserbeam openai

Set both API keys as environment variables:

export BROWSERBEAM_API_KEY="bb_live_..."
export OPENAI_API_KEY="sk-..."

Browserbeam reads BROWSERBEAM_API_KEY automatically when you don't pass api_key to the constructor. You can get a key by signing up for a free account.

Target Site Selection

For this tutorial, we'll use real, scraping-friendly websites as stand-ins for competitor sites:

Demo Site	Simulates	What We Extract
books.toscrape.com	Competitor pricing page	Product names, prices, stock status
quotes.toscrape.com	Competitor blog/content	Posts, authors, tags
news.ycombinator.com	Industry news feed	Headlines, URLs, rankings

In production, replace these with your actual competitor URLs and adjust the extraction schemas.

Project Structure

ci-agent/
  extract.py      # Browserbeam extraction functions
  analyze.py      # GPT-5.4 analysis and diff detection
  notify.py       # Slack and email alerting
  pipeline.py     # Main pipeline that ties everything together
  snapshots/      # JSON snapshot storage
  requirements.txt

We'll build each file separately, then combine them in pipeline.py.

Extracting Competitor Data with Browserbeam

This is where the CI agent gets its data. Each extraction function visits a target site and returns structured JSON.

Pricing Page Extraction

Let's start with the most common CI target: competitor pricing. We'll extract product names, prices, and availability from books.toscrape.com:

from browserbeam import Browserbeam
import json

def extract_pricing(url="https://books.toscrape.com"):
    """Extract product pricing data from a competitor site."""
    bb = Browserbeam()
    session = bb.sessions.create(url=url, timeout=60)
    try:
        result = session.extract(
            products=[{
                "_parent": "article.product_pod",
                "name": "h3 a >> text",
                "price": ".price_color >> text",
                "stock": ".instock.availability >> text"
            }]
        )
        products = result.extraction.get("products", [])
        return {
            "source": url,
            "type": "pricing",
            "products": products,
            "count": len(products)
        }
    finally:
        session.close()

The extract method takes a schema as keyword arguments. The _parent selector tells Browserbeam to find all matching elements and extract the specified fields from each one. The result comes back as clean JSON.

Notice the try/finally block. Always close your sessions, even if extraction fails. Open sessions consume resources until they time out. For more on session management, see the Browserbeam API docs.

Product Feature Monitoring

Feature pages are trickier because every competitor structures them differently. Here's a general approach using observe to get the page content and then extract for specific elements:

def extract_features(url):
    """Extract feature descriptions from a competitor's feature page."""
    bb = Browserbeam()
    session = bb.sessions.create(url=url, timeout=60)
    try:
        session.observe()
        content = ""
        if session.page and session.page.markdown:
            content = session.page.markdown.content
        return {
            "source": url,
            "type": "features",
            "content": content[:8000],
            "title": session.page.title if session.page else ""
        }
    finally:
        session.close()

We truncate the content to 8,000 characters. That's enough for GPT-5.4 to analyze, and it keeps the token cost down. For pages with lazy-loaded content, use session.scroll_collect() instead of session.observe() to capture content that loads on scroll.

Blog and Content Tracking

Blog posts and changelogs signal product direction. Let's extract headlines from Hacker News as a stand-in for a competitor's blog:

def extract_content(url="https://news.ycombinator.com"):
    """Extract headlines and links from a content page."""
    bb = Browserbeam()
    session = bb.sessions.create(url=url, timeout=60)
    try:
        result = session.extract(
            articles=[{
                "_parent": ".athing",
                "headline": ".titleline > a >> text",
                "url": ".titleline > a >> href",
                "rank": ".rank >> text"
            }]
        )
        articles = result.extraction.get("articles", [])
        return {
            "source": url,
            "type": "content",
            "articles": articles[:20],
            "count": len(articles)
        }
    finally:
        session.close()

We cap at 20 articles to keep the payload manageable. In production, you'd track competitor blog RSS feeds or changelog pages with selectors tailored to their specific HTML structure.

For a full reference on Browserbeam's extraction capabilities, see the structured web scraping guide.

Analyzing Data with GPT-5.4

Raw extraction data is just numbers and text. GPT-5.4 turns it into competitive insights.

Structuring Prompts for Comparison

The key to useful GPT-5.4 analysis is structured output. We use JSON mode to get consistent, parseable responses:

from openai import OpenAI
import json

openai_client = OpenAI()

def analyze_snapshot(current_data, previous_data=None):
    """Ask GPT-5.4 to analyze extracted data and compare with previous snapshot."""
    messages = [
        {
            "role": "system",
            "content": """You are a competitive intelligence analyst. Analyze the provided data
and return a JSON object with these fields:
- "summary": A 2-3 sentence overview of what you see
- "changes": Array of specific changes detected (empty if no previous data)
- "notable": Array of items worth flagging to the team
- "risk_level": "low", "medium", or "high" based on competitive impact"""
        },
        {
            "role": "user",
            "content": f"Current data:\n{json.dumps(current_data, indent=2)}"
        }
    ]

    if previous_data:
        messages.append({
            "role": "user",
            "content": f"Previous snapshot for comparison:\n{json.dumps(previous_data, indent=2)}"
        })

    response = openai_client.chat.completions.create(
        model="gpt-5.4",
        messages=messages,
        response_format={"type": "json_object"},
        temperature=0.2
    )

    return json.loads(response.choices[0].message.content)

Setting temperature=0.2 keeps the analysis consistent between runs. Higher temperatures produce more creative (and less reliable) summaries.

Generating Competitive Summaries

For a weekly digest, we want GPT-5.4 to summarize multiple days of changes into one actionable report:

def generate_weekly_digest(weekly_analyses):
    """Generate a weekly competitive intelligence summary from daily analyses."""
    response = openai_client.chat.completions.create(
        model="gpt-5.4",
        messages=[
            {
                "role": "system",
                "content": """You are a competitive intelligence analyst writing a weekly brief
for the product team. Be concise and focus on actionable insights. Return JSON with:
- "headline": One sentence summarizing the week
- "key_changes": Top 3-5 most important changes
- "trends": Any patterns across the week
- "recommended_actions": 1-3 things the team should consider doing"""
            },
            {
                "role": "user",
                "content": f"This week's daily analyses:\n{json.dumps(weekly_analyses, indent=2)}"
            }
        ],
        response_format={"type": "json_object"},
        temperature=0.3
    )
    return json.loads(response.choices[0].message.content)

Detecting Meaningful Changes

Not every change matters. A product going from $29.00 to $29.01 might be a rounding difference. A product going from $29 to $49 is a pricing strategy shift. GPT-5.4 is good at this kind of judgment:

def detect_changes(current_products, previous_products):
    """Use GPT-5.4 to identify meaningful changes between snapshots."""
    response = openai_client.chat.completions.create(
        model="gpt-5.4",
        messages=[
            {
                "role": "system",
                "content": """Compare two product snapshots and identify meaningful changes.
Ignore minor formatting differences (whitespace, capitalization).
Flag: price changes > 5%, new products, removed products, stock status changes.
Return JSON with:
- "meaningful_changes": Array of {"item", "field", "old_value", "new_value", "significance"}
- "noise": Count of minor/irrelevant differences filtered out"""
            },
            {
                "role": "user",
                "content": f"Current:\n{json.dumps(current_products)}\n\nPrevious:\n{json.dumps(previous_products)}"
            }
        ],
        response_format={"type": "json_object"},
        temperature=0.1
    )
    return json.loads(response.choices[0].message.content)

Low temperature (0.1) makes the change detection deterministic. We want consistent results, not creative interpretations.

Building the Monitoring Pipeline

Now let's wire the extraction and analysis into a pipeline that runs on a schedule.

Scheduling Recurring Checks

For a simple CI agent, a cron job works well. Here's a pipeline script that runs the full cycle:

import json
import os
from datetime import datetime, date
from pathlib import Path

SNAPSHOT_DIR = Path("snapshots")
SNAPSHOT_DIR.mkdir(exist_ok=True)

TARGETS = [
    {"name": "competitor_pricing", "url": "https://books.toscrape.com", "extractor": extract_pricing},
    {"name": "industry_news", "url": "https://news.ycombinator.com", "extractor": extract_content},
]

def run_pipeline():
    """Run one full CI cycle: extract, compare, analyze, alert, store."""
    today = date.today().isoformat()
    results = []

    for target in TARGETS:
        print(f"Extracting: {target['name']}")
        current = target["extractor"](target["url"])
        previous = load_previous_snapshot(target["name"])
        analysis = analyze_snapshot(current, previous)
        save_snapshot(target["name"], today, current)
        results.append({
            "target": target["name"],
            "analysis": analysis,
            "extracted_at": datetime.now().isoformat()
        })

    changes = [r for r in results if r["analysis"].get("risk_level") != "low"]
    if changes:
        send_slack_alert(changes)

    return results

Schedule this with cron (daily at 7 AM):

0 7 * * * cd /path/to/ci-agent && python pipeline.py

For Python-native scheduling, APScheduler is a solid alternative:

from apscheduler.schedulers.blocking import BlockingScheduler

scheduler = BlockingScheduler()
scheduler.add_job(run_pipeline, "cron", hour=7)
scheduler.start()

Storing Historical Snapshots

We store each snapshot as a JSON file, organized by target and date:

def save_snapshot(target_name, date_str, data):
    """Save a snapshot to the snapshots directory."""
    target_dir = SNAPSHOT_DIR / target_name
    target_dir.mkdir(exist_ok=True)
    filepath = target_dir / f"{date_str}.json"
    with open(filepath, "w") as f:
        json.dump(data, f, indent=2)

def load_previous_snapshot(target_name):
    """Load the most recent snapshot for a target."""
    target_dir = SNAPSHOT_DIR / target_name
    if not target_dir.exists():
        return None
    files = sorted(target_dir.glob("*.json"), reverse=True)
    if not files:
        return None
    with open(files[0]) as f:
        return json.load(f)

This produces a directory structure like:

snapshots/
  competitor_pricing/
    2026-04-12.json
    2026-04-11.json
    2026-04-10.json
  industry_news/
    2026-04-12.json

For production, consider PostgreSQL or SQLite for better querying and retention policies. JSON files work for getting started and prototyping.

Diff Detection Between Runs

Before sending data to GPT-5.4, a quick programmatic diff reduces noise and API costs:

def quick_diff(current, previous):
    """Fast programmatic diff before sending to GPT-5.4.
    Returns True if there are changes worth analyzing."""
    if previous is None:
        return True

    if current.get("type") == "pricing":
        curr_prices = {p["name"]: p["price"] for p in current.get("products", [])}
        prev_prices = {p["name"]: p["price"] for p in previous.get("products", [])}
        if curr_prices != prev_prices:
            return True
        new_products = set(curr_prices.keys()) - set(prev_prices.keys())
        removed_products = set(prev_prices.keys()) - set(curr_prices.keys())
        if new_products or removed_products:
            return True

    if current.get("type") == "content":
        curr_headlines = {a["headline"] for a in current.get("articles", [])}
        prev_headlines = {a["headline"] for a in previous.get("articles", [])}
        if curr_headlines != prev_headlines:
            return True

    return False

This check runs locally (zero cost) and skips the GPT-5.4 call when nothing changed. On quiet days, that saves you both time and money.

Alerting and Reporting

The agent finds changes. Now it needs to tell someone.

Slack and Email Notifications

A Slack webhook is the simplest alerting channel. Here's how to send a formatted CI alert:

import requests

SLACK_WEBHOOK_URL = os.environ.get("SLACK_WEBHOOK_URL", "")

def send_slack_alert(changes):
    """Send competitive intelligence alerts to Slack."""
    if not SLACK_WEBHOOK_URL:
        print("No Slack webhook configured. Printing to console.")
        for change in changes:
            print(json.dumps(change, indent=2))
        return

    blocks = [{"type": "header", "text": {"type": "plain_text", "text": "CI Agent Alert"}}]
    for change in changes:
        target = change["target"]
        analysis = change["analysis"]
        risk = analysis.get("risk_level", "unknown")
        summary = analysis.get("summary", "No summary available.")
        blocks.append({
            "type": "section",
            "text": {
                "type": "mrkdwn",
                "text": f"*{target}* (risk: {risk})\n{summary}"
            }
        })

    requests.post(SLACK_WEBHOOK_URL, json={"blocks": blocks})

For email, use Python's built-in smtplib:

import smtplib
from email.mime.text import MIMEText

def send_email_alert(changes, recipient="team@yourcompany.com"):
    """Send CI alerts via email."""
    body = "Competitive Intelligence Update\n\n"
    for change in changes:
        analysis = change["analysis"]
        body += f"Target: {change['target']}\n"
        body += f"Risk: {analysis.get('risk_level', 'unknown')}\n"
        body += f"Summary: {analysis.get('summary', '')}\n\n"

    msg = MIMEText(body)
    msg["Subject"] = "CI Agent: Changes Detected"
    msg["From"] = "ci-agent@yourcompany.com"
    msg["To"] = recipient

    with smtplib.SMTP("smtp.yourcompany.com", 587) as server:
        server.starttls()
        server.login("ci-agent@yourcompany.com", os.environ["SMTP_PASSWORD"])
        server.send_message(msg)

Weekly Digest Generation

Daily alerts catch individual changes. A weekly digest reveals patterns. Here's how to generate one:

def generate_and_send_digest():
    """Compile a week of snapshots and generate a digest."""
    weekly_analyses = []
    for target in TARGETS:
        target_dir = SNAPSHOT_DIR / target["name"]
        if not target_dir.exists():
            continue
        files = sorted(target_dir.glob("*.json"), reverse=True)[:7]
        snapshots = []
        for f in files:
            with open(f) as fh:
                snapshots.append(json.load(fh))
        if len(snapshots) >= 2:
            analysis = analyze_snapshot(snapshots[0], snapshots[-1])
            weekly_analyses.append({
                "target": target["name"],
                "days_covered": len(snapshots),
                "analysis": analysis
            })

    if weekly_analyses:
        digest = generate_weekly_digest(weekly_analyses)
        send_slack_alert([{
            "target": "Weekly Digest",
            "analysis": {
                "summary": digest.get("headline", ""),
                "risk_level": "medium"
            }
        }])
        return digest
    return None

Schedule the digest on Monday mornings:

0 8 * * 1 cd /path/to/ci-agent && python -c "from pipeline import generate_and_send_digest; generate_and_send_digest()"

CI Tool Comparison: Build vs Buy

Before committing to building your own CI agent, consider the alternatives. Several SaaS platforms offer competitive intelligence as a service.

Feature	Custom Agent (this guide)	Klue	Crayon	Kompyte
Setup time	2-4 hours	Days to weeks	Days to weeks	Days to weeks
Monthly cost	$5-20 (API costs)	$10,000+	$10,000+	$5,000+
Customization	Full control	Limited to templates	Limited to templates	Moderate
Data sources	Any website	Pre-built competitor profiles	Pre-built competitor profiles	Pre-built profiles
Analysis	GPT-5.4 (your prompts)	Built-in AI	Built-in AI	Built-in AI
Alerting	Your channels	In-app + integrations	In-app + integrations	In-app + email
Maintenance	You manage it	Managed SaaS	Managed SaaS	Managed SaaS
Data ownership	Full	Vendor-controlled	Vendor-controlled	Vendor-controlled

When to Build Your Own

Build a custom CI agent when:

You monitor fewer than 20 competitors and need specific data points
Your team has Python developers who can maintain the pipeline
You want full control over extraction schemas and analysis prompts
Budget is under $500/month for the entire CI function
You need data from non-standard sources (login-protected pages, internal tools)

When SaaS Makes Sense

Buy a CI platform when:

You track 50+ competitors across many dimensions
Your team doesn't have development capacity to maintain automation
You need built-in collaboration features (battle cards, sales enablement)
The annual cost ($60,000-120,000) fits within your competitive intelligence budget
You need enterprise compliance and audit trails

Most developer tools companies and startups tracking 3-10 competitors will get more value from the custom approach. The cost difference alone (hundreds vs. tens of thousands per year) justifies the build time.

If you already use the OpenAI Agents SDK, you can wrap these extraction functions as agent tools for a more conversational interface. See the OpenAI Agents SDK + Browserbeam guide for that approach.

Common Mistakes

Five patterns that break CI agents in production. All of them are avoidable.

Monitoring Too Many Pages

Starting with 50 URLs sounds thorough. In practice, most of those pages rarely change, and the noise buries the signal. Start with 3-5 high-value pages per competitor (pricing, changelog, main landing page). Add more only when the baseline is stable and you've confirmed the extraction works reliably.

Not Handling Site Redesigns

Competitors redesign their websites. When they do, your CSS selectors break and the extraction returns empty data. Build a check into your pipeline:

def validate_extraction(data, expected_type):
    """Check that extraction returned usable data."""
    if expected_type == "pricing":
        products = data.get("products", [])
        if len(products) == 0:
            return False, "No products extracted. Selectors may be broken."
        if any(not p.get("price") for p in products):
            return False, "Some products missing prices."
    return True, "OK"

When validation fails, alert your team to update the selectors instead of sending empty analysis reports.

Alerting on Noise Instead of Signal

Every website has minor variations between loads: timestamps change, session IDs rotate, cache headers differ. If your diff detection compares raw page content, you'll get an alert every single run.

The fix: extract structured data (prices, names, dates) rather than comparing raw page content. Use the quick_diff function from earlier to filter out non-changes before sending to GPT-5.4.

Skipping Data Validation

GPT-5.4 will analyze whatever you send it, including empty data, malformed JSON, and error messages. If your extraction fails silently and returns an empty list, GPT-5.4 will happily report "no products found on this competitor's site" instead of flagging the error.

Validate extraction results before passing them to analysis. Check for minimum expected counts, required fields, and reasonable values.

Running Without Rate Limits

Hitting a competitor's website 100 times per hour will get your IP blocked and might raise legal concerns. Set reasonable intervals:

Pricing pages: Once or twice daily
Blog/changelog: Once daily
Landing pages: Weekly

Add a delay between requests to different pages on the same domain. Two seconds between requests is a reasonable starting point. For more on responsible automation practices, see the scaling web automation guide.

Frequently Asked Questions

What are the best competitive intelligence tools?

For developer teams with Python experience, building a custom CI agent with Browserbeam and GPT-5.4 costs $5-20/month and provides full control over extraction and analysis. SaaS options like Klue, Crayon, and Kompyte cost $5,000-10,000+/month but include built-in collaboration and sales enablement features. The right choice depends on team size, budget, and how many competitors you track.

How do I monitor competitor pricing automatically?

Use Browserbeam to visit competitor pricing pages on a schedule, extract product names and prices with schema-based extraction, store each snapshot as JSON, and compare the current snapshot to the previous one. GPT-5.4 can flag meaningful price changes (new tiers, price increases, removed plans) and filter out noise.

Can AI do market research?

Yes. AI-powered market research combines browser automation (to collect data from competitor websites, review sites, and industry publications) with LLM analysis (to summarize findings, detect trends, and generate reports). The approach in this guide covers the data collection and analysis pipeline. For broader market research, add data sources like G2, Capterra, and industry forums.

How often should I check competitor websites?

Pricing pages and product feature pages work well with daily checks. Blog and changelog pages are fine with daily or every-other-day monitoring. Landing pages and marketing copy change less frequently, so weekly checks are sufficient. Start with daily for critical pages and reduce frequency for pages that rarely change.

Is competitor website scraping legal?

Scraping publicly available information from competitor websites is generally legal in the United States under the 2022 hiQ Labs v. LinkedIn ruling by the Ninth Circuit Court of Appeals. That said, always respect robots.txt, avoid scraping behind login walls without permission, and don't overload servers with aggressive request rates. Consult your company's legal team for specific guidance.

What data should a competitive intelligence agent track?

Focus on data your team can act on: product pricing and packaging changes, new feature announcements, blog posts signaling product direction, job postings revealing strategic priorities, and landing page messaging changes. Avoid tracking vanity metrics or data that doesn't connect to business decisions.

How do I detect meaningful changes vs noise?

Use a two-layer approach. First, run a programmatic diff that compares structured fields (prices, product names, headlines) and ignores formatting differences. Second, send detected changes to GPT-5.4 with a prompt that asks it to classify each change by significance. This combination filters out false positives while catching real competitive moves.

Can I use Browserbeam with the OpenAI Agents SDK for CI?

Yes. Wrap the extraction functions from this guide as @function_tool decorated functions in the OpenAI Agents SDK, and let the agent decide which competitors to check and what data to extract. See the OpenAI Agents SDK + Browserbeam guide for the setup. This guide uses the raw openai library instead to keep the focus on the CI pipeline itself.

What to Build Next

You've got a working CI agent: extraction, analysis, diff detection, and alerting. The next step is making it your own.

Try adding a new competitor. Swap books.toscrape.com for your actual competitor's pricing page and adjust the extraction schema. The structure stays the same. Only the selectors change.

Try adding a new data source. Job postings (LinkedIn, company careers pages) reveal hiring patterns that signal strategic direction. Product review sites (G2, Capterra) reveal customer sentiment trends. Each new source is another extract_ function and another entry in TARGETS.

For LLM-powered approaches, explore the browser automation tutorial for raw function calling patterns. For production hardening, read the scaling web automation guide and the security best practices guide. And check the Browserbeam API docs for the full extraction reference.

What competitor will your agent track first?