Browserbeam & LangChain: How to Build AI Agents with Browser Access

April 07, 2026 20 min read

Your LangChain agent can browse the web, fill forms, and extract structured data from any page. No Playwright process to manage, no HTML to parse, no custom browser driver. You wrap five Browserbeam functions as LangChain tools, hand them to a ReAct agent, and the LLM decides when to open a browser, what to click, and what to extract.

This guide walks through the full integration. You'll create custom LangChain tools backed by Browserbeam's Python SDK, wire them into a LangGraph ReAct agent, and build real agent workflows that browse, extract, and act on live web data.

In this guide, you'll learn:

  • How to wrap Browserbeam's Python SDK methods as LangChain tools
  • How to configure a LangGraph ReAct agent with browser capabilities
  • Agent architecture patterns: ReAct, plan-and-execute, multi-agent
  • Real-world use cases: competitive intelligence, lead research, content monitoring
  • Common mistakes that waste tokens or leak sessions
  • How LangChain compares to CrewAI and AutoGen for browser tasks
  • Performance patterns: caching, streaming, and token budgets

TL;DR: Wrap Browserbeam's SDK methods (create session, observe, click, extract, close) as @tool-decorated functions. Pass them to LangGraph's create_react_agent. The LLM gets structured markdown and element refs instead of raw HTML, which cuts token usage by 90%+ and gives the agent clean data to reason about. Full working code below.


What LangChain Is and Why It Needs a Browser

LangChain is a Python framework for building applications powered by language models. It provides abstractions for tool calling, agent loops, memory, and output parsing. LangGraph extends it with a stateful graph runtime that handles more complex agent workflows.

The problem: LangChain agents can call APIs, query databases, and search the web. But they can't interact with web pages. They can't fill a login form, click through a multi-step checkout, or extract structured data from a JavaScript-rendered dashboard. For that, they need a browser.

Why Not Just Use Playwright Directly?

You could spin up Playwright inside a LangChain tool. Teams do this, and it works for simple cases. But it creates problems at scale:

Challenge Playwright Directly Browserbeam
Browser process management You manage Chrome instances Managed cloud sessions
Output format Raw HTML (50,000+ tokens) Structured markdown (500-2,000 tokens)
Element targeting CSS selectors you build Element refs the API provides
Cookie banners, popups Manual handling Auto-dismissed
Session cleanup Easy to leak processes Auto-expires with timeout
Scaling One browser per machine Concurrent cloud sessions

The token cost alone makes the case. A raw HTML page averages 50,000 tokens. Browserbeam's structured markdown output of the same page is 500-2,000 tokens. For a LangChain agent that browses 10 pages per task, that's the difference between $0.50 and $0.02 in LLM costs.

What the Integration Looks Like

The integration is straightforward. Browserbeam's Python SDK provides methods for every browser action: create session, observe, click, fill, extract, close. You wrap each method in a function with a @tool decorator. LangChain's agent runtime discovers these tools through their docstrings and calls them when the LLM decides it needs browser access.


Setting Up Browserbeam in a LangChain Agent

Installing Dependencies

You need three packages: Browserbeam's Python SDK, LangChain's OpenAI integration, and LangGraph for the agent runtime.

pip install browserbeam langchain-openai langgraph

Set your API keys as environment variables:

export BROWSERBEAM_API_KEY="your_browserbeam_key"
export OPENAI_API_KEY="your_openai_key"

Creating Browserbeam Tools for LangChain

The @tool decorator from langchain_core.tools converts a Python function into a tool the agent can call. The function's docstring tells the LLM when and how to use it.

Here's the full set of browser tools:

from langchain_core.tools import tool
from browserbeam import Browserbeam

bb_client = Browserbeam()
active_session = None


@tool
def open_browser(url: str) -> str:
    """Open a browser session and navigate to a URL. Returns the page title
    and a markdown summary of the page content. Use this as the first step
    when you need to visit a web page."""
    global active_session
    if active_session:
        active_session.close()
    active_session = bb_client.sessions.create(url=url)
    page = active_session.page
    content = page.markdown.content if page.markdown else ""
    return f"Title: {page.title}\n\n{content[:3000]}"


@tool
def observe_page() -> str:
    """Get the current page content as structured markdown. Use this after
    navigation or clicks to see what changed on the page."""
    if not active_session:
        return "Error: No browser session open. Call open_browser first."
    result = active_session.observe()
    page = active_session.page
    content = page.markdown.content if page.markdown else ""
    return f"Title: {page.title}\nStable: {page.stable}\n\n{content[:3000]}"


@tool
def click_element(ref: str) -> str:
    """Click an element on the page by its ref ID (e.g., 'e5'). Use refs
    from the interactive elements list returned by observe_page."""
    if not active_session:
        return "Error: No browser session open."
    active_session.click(ref=ref)
    page = active_session.page
    return f"Clicked {ref}. Page now: {page.title}"


@tool
def extract_data(schema_json: str) -> str:
    """Extract structured data from the current page. Pass a JSON schema
    string like: {"title": "h1 >> text", "items": [{"_parent": ".card",
    "name": "h2 >> text", "price": ".price >> text"}]}"""
    import json
    if not active_session:
        return "Error: No browser session open."
    schema = json.loads(schema_json)
    result = active_session.extract(**schema)
    return json.dumps(result.extraction, indent=2)


@tool
def close_browser() -> str:
    """Close the current browser session. Always call this when done
    browsing to free resources."""
    global active_session
    if active_session:
        active_session.close()
        active_session = None
        return "Browser session closed."
    return "No session to close."

A few things to notice about these tool definitions:

  1. Docstrings are instructions. The LLM reads them to decide which tool to call. Write them like you're explaining the tool to a junior developer.
  2. Return strings, not objects. LangChain tools must return strings. Serialize extraction results as JSON.
  3. Truncate page content. The [:3000] slice prevents flooding the LLM context window. Adjust based on your model's token budget.
  4. Global session state. The simplest pattern uses a module-level session variable. For production, use a session manager (covered in the Performance section).

Configuring the Agent Executor

With the tools defined, creating the agent takes four lines:

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

model = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [open_browser, observe_page, click_element, extract_data, close_browser]

agent = create_react_agent(model, tools)

result = agent.invoke({
    "messages": [("user",
        "Go to https://books.toscrape.com and extract the titles and prices "
        "of the first 5 books. Then close the browser.")]
})

for msg in result["messages"]:
    print(f"{msg.type}: {msg.content[:200]}")

The agent will:
1. Call open_browser with the URL
2. Read the page content from the return value
3. Call extract_data with a schema it constructs from the page structure
4. Call close_browser to clean up
5. Return the extracted data in its final response


Agent Architecture Patterns with LangChain

The ReAct agent above is the simplest pattern. LangChain and LangGraph support more sophisticated architectures for complex browser workflows.

ReAct Agent with Browser Tools

The ReAct (Reason + Act) pattern is the default. The LLM alternates between thinking ("I need to find the pricing page") and acting ("Call click_element with ref e12"). It works well for straightforward tasks with 3-8 steps.

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
react_agent = create_react_agent(model, tools, checkpointer=memory)

config = {"configurable": {"thread_id": "pricing-research"}}
result = react_agent.invoke({
    "messages": [("user",
        "Go to https://news.ycombinator.com, find the top 5 stories, "
        "and summarize each headline in one sentence.")]
}, config=config)

The MemorySaver checkpointer gives the agent conversation memory. If you invoke it again with the same thread_id, the agent remembers what it did before. This is useful for multi-turn browser workflows where a human reviews results between steps.

Plan-and-Execute Agent Pattern

For tasks with 10+ steps, the ReAct loop can lose track of progress. The plan-and-execute pattern splits the work into two LLM calls: one plans the steps, the other executes them one at a time.

from langchain_openai import ChatOpenAI

planner = ChatOpenAI(model="gpt-4o", temperature=0)
executor_agent = create_react_agent(model, tools)

plan_prompt = """
You are a browser automation planner. Given a task, output a numbered list
of browser steps. Each step should be one action (open page, click, extract).

Task: {task}
"""

def plan_and_execute(task: str):
    plan_response = planner.invoke(plan_prompt.format(task=task))
    steps = plan_response.content.strip().split("\n")

    results = []
    for step in steps:
        if not step.strip():
            continue
        result = executor_agent.invoke({
            "messages": [("user", f"Execute this browser step: {step}")]
        })
        last_msg = result["messages"][-1].content
        results.append({"step": step, "result": last_msg})

    return results

The planner uses a cheaper, faster model call (no tools needed). The executor handles each step with full browser access. If a step fails, you can retry just that step instead of restarting the whole workflow.

Multi-Agent Browser Workflows

LangGraph lets you build workflows where multiple agents collaborate. A common pattern for browser tasks: one agent navigates and extracts data, another analyzes the results, a third writes reports.

from langgraph.graph import StateGraph, MessagesState, START, END

browser_agent = create_react_agent(model, tools)
analyst = ChatOpenAI(model="gpt-4o", temperature=0)

def browse_node(state: MessagesState):
    result = browser_agent.invoke(state)
    return {"messages": result["messages"]}

def analyze_node(state: MessagesState):
    last_content = state["messages"][-1].content
    analysis = analyst.invoke([
        ("system", "You analyze web data and produce concise reports."),
        ("user", f"Analyze this data and give me the key takeaways:\n{last_content}")
    ])
    return {"messages": [analysis]}

graph = StateGraph(MessagesState)
graph.add_node("browse", browse_node)
graph.add_node("analyze", analyze_node)
graph.add_edge(START, "browse")
graph.add_edge("browse", "analyze")
graph.add_edge("analyze", END)

app = graph.compile()
result = app.invoke({
    "messages": [("user",
        "Go to https://books.toscrape.com and extract all book titles "
        "and prices from the first page.")]
})

The browser agent handles the messy work of navigating and extracting. The analyst agent gets clean data and produces a summary. Neither agent needs to know how the other works.


Full Working Example: Price Comparison Agent

Here's a complete, runnable agent that compares book prices across categories on books.toscrape.com. It navigates to a category, extracts prices, moves to the next category, then compares the averages.

import json
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from browserbeam import Browserbeam

bb = Browserbeam()
session = None

@tool
def browse(url: str) -> str:
    """Open a browser to a URL and return the page content as markdown."""
    global session
    if session:
        session.close()
    session = bb.sessions.create(url=url)
    page = session.page
    md = page.markdown.content if page.markdown else ""
    elements = session.page.interactive_elements
    el_list = "\n".join(
        f"  {e.ref}: [{e.tag}] {e.label}" for e in elements[:20]
    )
    return f"Title: {page.title}\n\nInteractive elements:\n{el_list}\n\nContent:\n{md[:2000]}"

@tool
def click(ref: str) -> str:
    """Click an interactive element by ref ID. Returns updated page info."""
    if not session:
        return "No session open."
    session.click(ref=ref)
    page = session.page
    elements = session.page.interactive_elements
    el_list = "\n".join(
        f"  {e.ref}: [{e.tag}] {e.label}" for e in elements[:20]
    )
    return f"Clicked {ref}. Now on: {page.title}\n\nElements:\n{el_list}"

@tool
def extract(schema_json: str) -> str:
    """Extract structured data. Pass JSON schema string, e.g.:
    {"books": [{"_parent": "article.product_pod", "_limit": 5,
    "title": "h3 a >> text", "price": ".price_color >> text"}]}"""
    if not session:
        return "No session open."
    schema = json.loads(schema_json)
    result = session.extract(**schema)
    return json.dumps(result.extraction, indent=2)

@tool
def done() -> str:
    """Close the browser session. Call when finished browsing."""
    global session
    if session:
        session.close()
        session = None
    return "Session closed."

model = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(model, [browse, click, extract, done])

result = agent.invoke({
    "messages": [("user",
        "Go to https://books.toscrape.com. Extract the titles and prices of "
        "the first 5 books from the 'Travel' category and the 'Mystery' "
        "category. Compare the average prices and tell me which category "
        "is cheaper. Close the browser when done.")]
})

print(result["messages"][-1].content)

The agent typically completes this in 6-8 tool calls: open the site, navigate to Travel, extract, navigate to Mystery, extract, close, then reason about the averages in its final response.


Real-World Integration Use Cases

Competitive Intelligence Agent

Build an agent that monitors competitor pricing daily. The agent opens a competitor's product page, extracts current prices using a declarative schema, compares them to yesterday's snapshot, and flags changes.

@tool
def check_competitor_prices(url: str) -> str:
    """Visit a competitor product page and extract pricing data."""
    global active_session
    active_session = bb_client.sessions.create(url=url)
    result = active_session.extract(
        products=[{
            "_parent": "article.product_pod",
            "_limit": 10,
            "title": "h3 a >> text",
            "price": ".price_color >> text"
        }]
    )
    active_session.close()
    active_session = None
    return json.dumps(result.extraction, indent=2)

Pair this with a storage tool that saves results to a database, and you have an automated price monitoring pipeline.

Automated Lead Research

An agent receives a company name, finds their website, navigates to the team page, and extracts contact information. The key here is structured web scraping rather than generic page reading. The agent uses extract with a schema tailored to team pages.

Content Monitoring and Alerting

Set up an agent that checks a list of URLs on a schedule, extracts specific content (documentation changes, blog updates, API changelog entries), and sends alerts when content changes. Browserbeam's diff tracking detects what changed between visits, so the agent only processes new content.


Why This Integration Saves Tokens and Time

The numbers tell the story. Here's what a typical 10-page browse-and-extract workflow looks like with and without Browserbeam:

Metric Raw HTML + Playwright Browserbeam
Tokens per page (input to LLM) 40,000-80,000 500-2,000
Tokens for 10 pages 400,000-800,000 5,000-20,000
LLM cost (GPT-4o, input) $1.00-$2.00 $0.01-$0.05
Agent steps to extract data 3-5 per page (parse, find, extract) 1 per page (extract with schema)
Browser process management Your responsibility Managed
Session cleanup on crash Manual (process leaks) Auto-expires

The token reduction is the biggest win. When your agent reads structured markdown instead of raw HTML, the LLM can reason about page content in a single step. With raw HTML, the agent often needs multiple rounds: one to parse the structure, one to find the data, one to extract it. That's three LLM calls instead of one.

Session management matters too. If your LangChain agent crashes mid-workflow, a local Playwright browser keeps running until you kill it manually. Browserbeam sessions have a configurable timeout (default: 5 minutes) and auto-expire. No orphaned processes.


Common Integration Mistakes

Not Wrapping Browser Actions as LangChain Tools

The most common mistake is calling Browserbeam directly inside the agent's main function instead of exposing it as tools. When you do this, the LLM can't decide when to use browser actions. It always runs the same fixed sequence regardless of what it finds on the page.

The fix: wrap every browser action as a @tool function. Let the LLM decide the order. That's what makes it an agent instead of a script.

Sending Full Page HTML to the LLM

If you use Playwright inside a LangChain tool and return page.content(), you're sending 50,000+ tokens of raw HTML to the LLM. The model will struggle to find the relevant data and burn through your token budget.

The fix: use Browserbeam's structured output. The observe method returns clean markdown. The extract method returns typed JSON. Both are 10-100x smaller than raw HTML.

Missing Session Cleanup in Agent Chains

LangChain agents can fail mid-execution. If the LLM makes a bad tool call, or the chain hits a timeout, your browser session stays open. Over time, leaked sessions consume your session quota.

The fix: wrap agent execution in a try/finally block:

try:
    result = agent.invoke({"messages": [("user", task)]})
finally:
    if active_session:
        active_session.close()
        active_session = None

Overloading the Agent with Too Many Tools

Giving the agent 15 browser tools (observe, observe with scope, observe with page map, click by ref, click by text, click by label...) confuses the LLM. It spends tokens deliberating between similar options.

The fix: consolidate into 4-6 tools. One tool to open a page, one to observe, one to click, one to extract, one to close. If you need advanced features like scroll_collect or fill_form, add them as separate tools only when the use case requires them.

Not Setting Token Limits on Browser Output

Browserbeam's observe endpoint can return thousands of characters of markdown for content-heavy pages. If you pass all of it to the LLM, you waste tokens on content the agent doesn't need.

The fix: truncate tool output or use the max_text_length parameter:

@tool
def observe_page(max_length: int = 2000) -> str:
    """Get the current page content. Optionally limit content length."""
    result = active_session.observe(max_text_length=max_length)
    page = active_session.page
    content = page.markdown.content if page.markdown else ""
    return f"Title: {page.title}\n\n{content}"

LangChain vs Other Agent Frameworks

LangChain isn't the only option for building browser agents. Here's how the main frameworks compare when paired with Browserbeam.

LangChain vs CrewAI for Browser Tasks

CrewAI focuses on multi-agent collaboration. You define "crews" of agents with specific roles (researcher, writer, reviewer) and they work together on a task. For browser automation, you'd create a browser agent that other agents delegate to.

The trade-off: CrewAI is simpler for multi-agent setups but less flexible for custom agent logic. LangGraph gives you more control over the execution graph, error handling, and state management.

LangChain vs AutoGen

AutoGen (from Microsoft) uses a conversational approach where agents talk to each other. For browser tasks, you'd create a browser agent that other agents request actions from through messages.

AutoGen's strength is multi-turn conversations between agents. Its weakness for browser automation is that the message-passing overhead adds latency. Each browser action requires a full round of agent conversation instead of a direct tool call.

Framework Comparison Table

Feature LangChain + LangGraph CrewAI AutoGen
Tool definition @tool decorator Custom tool classes Function registration
Agent loop ReAct, plan-and-execute, custom graphs Role-based crews Conversational
Memory MemorySaver, SqliteSaver, PostgresSaver Built-in crew memory Chat history
Browser integration Custom tools (this guide) Custom tools Function tools
Multi-agent LangGraph StateGraph Native (crews) Native (groups)
Streaming Built-in Limited Built-in
Production readiness High (LangSmith observability) Medium Medium
Learning curve Moderate Low Low

For most browser automation use cases, LangChain + LangGraph gives you the best combination of flexibility and production tooling. CrewAI is worth considering if your primary need is multi-agent collaboration with minimal setup. AutoGen fits best when agents need extended back-and-forth conversations.


Performance Optimization

Caching Browser Results

If your agent visits the same URL multiple times, cache the results. A simple in-memory cache avoids redundant browser sessions:

from functools import lru_cache
import hashlib

page_cache = {}

@tool
def browse_cached(url: str) -> str:
    """Open a URL or return cached content if recently visited."""
    cache_key = hashlib.md5(url.encode()).hexdigest()
    if cache_key in page_cache:
        return page_cache[cache_key]

    global session
    if session:
        session.close()
    session = bb.sessions.create(url=url)
    page = session.page
    content = f"Title: {page.title}\n\n{page.markdown.content[:3000]}"
    page_cache[cache_key] = content
    return content

For production, replace the dict with a Redis cache with TTL expiration. Pages change, so cache entries should expire after 5-30 minutes depending on how frequently the content updates.

Streaming vs Batch Agent Execution

LangGraph supports streaming, which lets you see the agent's reasoning as it happens:

for event in agent.stream(
    {"messages": [("user", "Extract book prices from books.toscrape.com")]},
    stream_mode="updates"
):
    for node, values in event.items():
        print(f"Node: {node}")
        if "messages" in values:
            for msg in values["messages"]:
                print(f"  {msg.type}: {msg.content[:100]}")

Streaming is useful for long-running browser tasks where you want to show progress to a user. For batch processing (running the same task across many URLs), use Python's asyncio with Browserbeam's async client:

import asyncio
from browserbeam import AsyncBrowserbeam

async def extract_from_url(client, url, schema):
    session = await client.sessions.create(url=url)
    result = await session.extract(**schema)
    await session.close()
    return result.extraction

async def batch_extract(urls, schema):
    client = AsyncBrowserbeam()
    tasks = [extract_from_url(client, url, schema) for url in urls]
    return await asyncio.gather(*tasks)

urls = [
    "https://books.toscrape.com/catalogue/page-1.html",
    "https://books.toscrape.com/catalogue/page-2.html",
    "https://books.toscrape.com/catalogue/page-3.html",
]
schema = {"books": [{"_parent": "article.product_pod", "_limit": 5,
                     "title": "h3 a >> text", "price": ".price_color >> text"}]}

results = asyncio.run(batch_extract(urls, schema))

Reducing LangChain Overhead

LangChain adds overhead: tool schema serialization, prompt assembly, and message history tracking. For browser agents, three optimizations make the biggest difference:

  1. Use GPT-4o-mini for simple navigation. Reserve GPT-4o for reasoning-heavy steps. A click or form fill doesn't need a large model.
  2. Limit message history. After 20+ messages, trim older tool results from the conversation. The agent doesn't need the full HTML from page 1 when it's working on page 5.
  3. Batch browser actions. Instead of one tool call per action, Browserbeam's act method accepts multiple steps. A single tool call can navigate, click, and extract in one round trip. See the LLM-powered browser automation guide for step batching patterns.

Frequently Asked Questions

Do I need a LangChain-specific Browserbeam package?

No. Browserbeam's standard Python SDK works directly with LangChain. You wrap the SDK methods as @tool-decorated functions. There's no separate LangChain integration package to install.

Can I use Browserbeam with LangChain's JavaScript/TypeScript SDK?

Yes. Browserbeam has a TypeScript SDK (@browserbeam/sdk). The LangChain.js tool function works the same way as the Python @tool decorator. Wrap Browserbeam's TypeScript methods and pass them to createReactAgent.

How do I handle authentication in a LangChain browser agent?

Use Browserbeam's fill_form method inside a login tool. The agent calls open_browser on the login page, then fill_form with credentials. Session cookies persist across subsequent tool calls within the same Browserbeam session. See the web scraping agent guide for login examples.

What happens if the agent makes a bad tool call?

LangGraph's ReAct loop handles errors gracefully. If a tool returns an error string (like "No browser session open"), the LLM reads the error and tries a different approach. Always return descriptive error messages from your tools instead of raising exceptions.

Can I use a different LLM besides OpenAI?

Yes. Replace ChatOpenAI with any LangChain-compatible chat model: ChatAnthropic for Claude, ChatGoogleGenerativeAI for Gemini, or any model that supports tool calling. The browser tools work the same regardless of which LLM drives the agent.

How do I prevent the agent from visiting malicious sites?

Add URL validation inside your open_browser tool. Check the URL against an allowlist of trusted domains before creating the session. For defense-in-depth, Browserbeam's sessions run in isolated browser contexts with no access to your local machine.

Is LangGraph required, or can I use the older AgentExecutor?

LangGraph's create_react_agent is the recommended approach in 2026. The older AgentExecutor from langchain.agents still works but is no longer actively developed. LangGraph gives you memory, streaming, checkpointing, and custom graph architectures that AgentExecutor doesn't support.

How many concurrent browser sessions can a LangChain agent use?

Each Browserbeam plan has a concurrency limit (5 sessions on Starter, 20 on Pro). A single LangChain agent typically uses one session at a time. For multi-agent setups or batch processing, each concurrent agent needs its own session. See scaling web automation for concurrency patterns.


What to Build Next

You now have everything you need to give your LangChain agent a browser. The integration is five tool functions and a create_react_agent call. From here, the question is what to point the agent at.

Start with a simple extraction task: open a page, extract data, close the session. Once that works, add multi-step navigation. Then try multi-agent workflows where one agent browses and another analyzes. The patterns in this guide scale from a weekend project to a production pipeline.

If you want to skip the LangChain integration entirely and connect your AI coding assistant directly to a browser, the MCP integration guide covers the Browserbeam MCP server for Cursor, Claude Desktop, and Windsurf.

Sign up for Browserbeam to get an API key and start building. The free tier gives you 1 hour of browser runtime to test your LangChain agent.

You might also like:

Give your AI agent a faster, leaner browser

Structured page data instead of raw HTML. Your agent processes less, decides faster, and costs less to run.

Stability detection built in
Fraction of the payload size
Diffs after every action
No credit card required. 1 hour of free runtime included.