OpenAI Agents SDK + Browserbeam: Give Your Agent Eyes on the Web

Your OpenAI agent can browse the web now. Not by scraping raw HTML with requests. By controlling a real browser, reading structured page content, and extracting exactly the data it needs. The OpenAI Agents SDK handles the agent loop, tool dispatch, and guardrails. Browserbeam handles the browser.

Before this combination existed, giving an agent browser access meant writing a manual tool dispatch loop with the raw openai library, managing Playwright processes, and parsing HTML into something an LLM could understand without burning through the context window. That's a lot of code for "go to this page and tell me what's on it."

The OpenAI Agents SDK (pip install openai-agents) changes the equation. You define tools with a decorator, wire them into an agent, and let the SDK handle the rest. Add Browserbeam as the browser layer, and your agent gets structured web access through four or five tool functions and zero browser infrastructure.

What you'll build in this guide:

A set of browser tools (navigate, observe, extract, click, fill) using the @function_tool decorator
A research agent that browses real websites and returns structured data
Input guardrails that restrict which URLs the agent can visit
Output guardrails that validate extraction results
A multi-agent handoff pattern where a researcher agent passes work to an extractor agent
Streaming agent responses for real-time feedback
Error handling patterns that keep your agent from crashing on bad pages

TL;DR: The OpenAI Agents SDK provides the agent loop, tool dispatch, guardrails, and multi-agent handoffs. Browserbeam provides the browser. Together, you get an agent that browses live websites, extracts structured data, and stays within the boundaries you set. This guide walks through the full implementation with GPT-5.4 and working code against real sites.

What Is the OpenAI Agents SDK?

The OpenAI Agents SDK is a Python framework for building agents that use OpenAI models. It handles the parts of agent development that every team rebuilds from scratch: the tool dispatch loop, conversation history, multi-agent coordination, and safety guardrails.

Architecture Overview

The SDK has three core concepts:

Concept	What It Does	Class
Agent	Defines the LLM, instructions, and available tools	`Agent`
Runner	Executes the agent loop (LLM call, tool dispatch, repeat)	`Runner`
Tools	Functions the agent can call to interact with the world	`@function_tool`

The agent loop works like this: the Runner sends the user's input to the LLM. The LLM decides whether to respond directly or call a tool. If it calls a tool, the Runner executes it and sends the result back to the LLM. This repeats until the LLM produces a final text response.

User Input → Agent (GPT-5.4) →

Tool Call → Browserbeam → Agent ↺ repeat

→ Final Output

You don't write the loop. The SDK runs it for you. Your job is defining the tools and the agent's instructions.

Tools, Guardrails, and Handoffs

Tools are Python functions decorated with @function_tool. The SDK automatically generates the JSON schema from the function's type hints and docstring, then passes it to the model. When the model calls a tool, the SDK invokes your function and feeds the result back.

Guardrails run checks on the agent's input or output. Input guardrails can block dangerous requests before the agent processes them. Output guardrails can validate the agent's final response before returning it to the user. Both use a tripwire pattern: if the guardrail triggers, the run stops with an error instead of returning bad output.

Handoffs let one agent transfer work to another. A research agent can hand off to an extraction agent, which can hand off to a formatting agent. Each agent has its own instructions, tools, and model. The SDK manages the conversation history across handoffs.

The Responses API vs Chat Completions

The Agents SDK uses OpenAI's Responses API by default, not the older Chat Completions API. The Responses API is OpenAI's newer interface, designed for tool use and agent workflows. It supports features like tool search and deferred tool loading that Chat Completions doesn't.

If you're coming from the raw openai library where you used client.chat.completions.create(), the Agents SDK abstracts this away. You don't call either API directly. The SDK handles it.

For non-OpenAI models, you can switch to Chat Completions mode with set_default_openai_api("chat_completions"). But for OpenAI models with GPT-5.4, stick with the default Responses API.

Why Agents Need Browser Access

LLMs know what was in their training data. They don't know what's on a website right now. For any task that requires current information (prices, stock availability, news, job listings, competitor analysis), the agent needs to go look.

The Gap Between LLMs and Live Data

GPT-5.4's knowledge cutoff is August 2025. Anything after that date, or any data that changes frequently (prices, availability, schedules), requires real-time access. Without browser tools, your agent can only guess.

The same applies to private or authenticated content. Login-protected dashboards, internal tools, and gated content don't exist in the training data. The agent needs a browser to access them.

Function Calling as the Bridge

OpenAI function calling (also called tool calling) is what connects an LLM to external systems. The model doesn't execute code. It outputs a structured request ("call function X with argument Y"), and your code executes it. The Agents SDK wraps this pattern so you don't manage the back-and-forth yourself.

For browser access, this means defining functions like navigate(url), extract(schema), and click(ref). The agent decides when to call each function. Browserbeam executes the browser actions and returns structured results. The agent processes the results and decides what to do next.

If you've built this pattern manually with the raw openai library, you know the boilerplate: parse tool_calls, match function names, execute, format results, append to messages, loop. The Agents SDK eliminates all of it. For a comparison of the manual approach, see our LLM-powered browser automation tutorial.

Setting Up Your Environment

Installing Dependencies

You need two packages: the OpenAI Agents SDK and the Browserbeam Python SDK.

pip install openai-agents browserbeam

This installs the agents module (from openai-agents) and the browserbeam module. Python 3.10 or later is required.

API Key Configuration

Set both API keys as environment variables:

export OPENAI_API_KEY="sk-..."
export BROWSERBEAM_API_KEY="bb_live_..."

The Agents SDK reads OPENAI_API_KEY automatically. The Browserbeam SDK reads BROWSERBEAM_API_KEY when you don't pass api_key explicitly. You can get a Browserbeam key by signing up for a free account.

Project Structure

A minimal project looks like this:

browser-agent/
  tools.py        # Browser tool definitions
  agent.py        # Agent configuration and runner
  guardrails.py   # Input/output guardrails
  requirements.txt

For this guide, we'll keep everything in a single file to make the examples self-contained. In production, split tools, agents, and guardrails into separate modules.

Defining Browser Tools for the Agent

Each browser action becomes a tool function decorated with @function_tool. The Agents SDK reads the function signature and docstring to generate the tool's JSON schema automatically. Type hints matter here: they tell the model what arguments each tool expects.

Navigate Tool

The navigate tool opens a URL in the browser and returns the page title and URL for confirmation:

from agents import function_tool
from browserbeam import Browserbeam

bb = Browserbeam()

@function_tool
def navigate(url: str) -> str:
    """Navigate the browser to a URL and return the page title.

    Args:
        url: The full URL to navigate to (must start with https://).
    """
    session = bb.sessions.create(url=url, timeout=60)
    title = session.page.title if session.page else "Unknown"
    return f"Navigated to: {title} ({session.page.url})"

The docstring becomes the tool description that the model sees. Be specific about what the function does and what the arguments mean. GPT-5.4 uses this description to decide when and how to call the tool.

Extract Tool

The extract tool pulls structured data from the current page. This is where Browserbeam's schema-based extraction shines: the agent describes what data it wants, and the API returns clean JSON.

import json

@function_tool
def extract_data(parent_selector: str, fields: str) -> str:
    """Extract structured data from elements on the current page.

    Args:
        parent_selector: CSS selector for the parent elements (e.g., "article.product_pod").
        fields: JSON string mapping field names to selectors (e.g., '{"title": "h3 a >> text", "price": ".price_color >> text"}').
    """
    field_map = json.loads(fields)
    schema_item = {"_parent": parent_selector, **field_map}
    result = session.extract(items=[schema_item])
    return json.dumps(result.extraction.get("items", []), indent=2)

The agent calls this tool with a selector and field mapping. Browserbeam finds all matching elements and extracts the specified fields. The result comes back as a JSON array that fits comfortably in the context window.

Observe Tool

The observe tool returns a markdown snapshot of the current page. This gives the agent a high-level view of the page content without overwhelming the context window with raw HTML:

@function_tool
def observe_page() -> str:
    """Get a markdown snapshot of the current page content. Use this to understand what's on the page before extracting specific data."""
    result = session.observe()
    if session.page and session.page.markdown:
        content = session.page.markdown.content
        if len(content) > 4000:
            content = content[:4000] + "\n\n[Content truncated at 4000 chars]"
        return content
    return "No page content available."

Truncating to 4,000 characters is a practical guard. A full page snapshot can be 20,000+ characters. The agent usually needs the first few sections to decide what to extract, not the entire page.

Click and Fill Tools

For interactive pages that require clicking buttons or filling forms:

@function_tool
def click_element(ref: str) -> str:
    """Click an element on the page by its ref identifier.

    Args:
        ref: The element ref (e.g., "e5") from the page's interactive elements.
    """
    session.click(ref=ref)
    return f"Clicked element {ref}. Page may have updated."

@function_tool
def fill_field(ref: str, value: str) -> str:
    """Fill a form field with a value.

    Args:
        ref: The element ref for the input field.
        value: The text to enter.
    """
    session.fill(value=value, ref=ref)
    return f"Filled {ref} with '{value}'."

These tools give the agent the ability to interact with pages, not just read them. Combined with observe_page, the agent can see the page, decide what to click, and fill forms.

Tools Summary

Tool	Browserbeam Method	Returns	When the Agent Uses It
`navigate`	`sessions.create()` / `goto()`	Page title and URL	Opening a new page or navigating to a different URL
`observe_page`	`observe()`	Truncated markdown content	Understanding page structure before extracting data
`extract_data`	`extract(**schema)`	JSON array of matching items	Pulling structured data from known page elements
`click_element`	`click(ref=...)`	Confirmation message	Interacting with buttons, links, pagination
`fill_field`	`fill(value=..., ref=...)`	Confirmation message	Entering text into search boxes, forms, filters

Building a Research Agent

Now let's wire the tools into a full agent. This example builds a research agent that navigates to books.toscrape.com, extracts book data, and returns a structured summary.

Agent Definition

from agents import Agent, Runner
from browserbeam import Browserbeam

bb = Browserbeam()
session = None

@function_tool
def browse(url: str) -> str:
    """Navigate to a URL and return a markdown snapshot of the page.

    Args:
        url: The full URL to visit.
    """
    global session
    if session:
        session.goto(url)
    else:
        session = bb.sessions.create(url=url, timeout=120)
    if session.page and session.page.markdown:
        content = session.page.markdown.content
        return content[:5000] if len(content) > 5000 else content
    return "Page loaded but no content available."

@function_tool
def extract_books() -> str:
    """Extract all book titles and prices from the current page on books.toscrape.com."""
    import json
    if not session:
        return "No browser session open. Call browse() first."
    result = session.extract(
        books=[{
            "_parent": "article.product_pod",
            "title": "h3 a >> text",
            "price": ".price_color >> text",
            "stock": ".instock.availability >> text"
        }]
    )
    return json.dumps(result.extraction.get("books", []), indent=2)

@function_tool
def close_browser() -> str:
    """Close the browser session. Always call this when done browsing."""
    global session
    if session:
        session.close()
        session = None
        return "Browser session closed."
    return "No session to close."

research_agent = Agent(
    name="Book Researcher",
    instructions="""You are a research agent that browses websites to collect book data.
When asked about books:
1. Browse to the given URL (or https://books.toscrape.com if none specified)
2. Extract book titles and prices using extract_books()
3. Summarize what you found
4. Always close the browser when done

Return the data in a clear, structured format.""",
    tools=[browse, extract_books, close_browser],
    model="gpt-5.4",
)

The instructions field is your agent's system prompt. Be specific about the workflow you want. GPT-5.4 follows numbered steps reliably.

The Agent Loop

Running the agent takes one line:

result = Runner.run_sync(research_agent, "Find the first 5 books and their prices on books.toscrape.com")
print(result.final_output)

Runner.run_sync blocks until the agent completes. Under the hood, the SDK:

Sends your input to GPT-5.4
GPT-5.4 calls browse("https://books.toscrape.com")
The SDK executes the tool and sends the result back
GPT-5.4 calls extract_books()
The SDK executes it and sends the result back
GPT-5.4 calls close_browser()
GPT-5.4 produces the final output

You didn't write any of that dispatch logic. The SDK handled it.

Running the Agent

Here's the complete, runnable script:

import json
from agents import Agent, Runner, function_tool
from browserbeam import Browserbeam

bb = Browserbeam()
session = None

@function_tool
def browse(url: str) -> str:
    """Navigate to a URL and return a markdown snapshot of the page.

    Args:
        url: The full URL to visit.
    """
    global session
    if session:
        session.goto(url)
    else:
        session = bb.sessions.create(url=url, timeout=120)
    if session.page and session.page.markdown:
        content = session.page.markdown.content
        return content[:5000] if len(content) > 5000 else content
    return "Page loaded but no content available."

@function_tool
def extract_books() -> str:
    """Extract all book titles and prices from the current page."""
    if not session:
        return "No browser session open. Call browse() first."
    result = session.extract(
        books=[{
            "_parent": "article.product_pod",
            "title": "h3 a >> text",
            "price": ".price_color >> text"
        }]
    )
    return json.dumps(result.extraction.get("books", []), indent=2)

@function_tool
def close_browser() -> str:
    """Close the browser session. Always call this when done."""
    global session
    if session:
        session.close()
        session = None
        return "Browser session closed."
    return "No session to close."

agent = Agent(
    name="Book Researcher",
    instructions="You research books on websites. Browse the URL, extract book data, summarize findings, and always close the browser when done.",
    tools=[browse, extract_books, close_browser],
    model="gpt-5.4",
)

result = Runner.run_sync(agent, "What are the top 5 cheapest books on books.toscrape.com?")
print(result.final_output)

Save this as agent.py, set your API keys, and run it. The agent will browse the real site, extract real data, and return a formatted answer.

For a deeper dive into Browserbeam's Python SDK, see the getting started guide.

Adding Guardrails

Guardrails are safety checks that run before or after the agent processes a request. For a browser agent, two guardrails are critical: restricting which URLs the agent can visit, and validating that the output contains real data.

URL Allowlisting

An input guardrail that blocks the agent from visiting URLs outside an approved list:

from agents import input_guardrail, GuardrailFunctionOutput, Agent, Runner

ALLOWED_DOMAINS = {"books.toscrape.com", "quotes.toscrape.com", "news.ycombinator.com"}

@input_guardrail
async def url_allowlist_check(ctx, agent, input):
    """Block requests that ask the agent to visit unapproved domains."""
    input_text = input if isinstance(input, str) else str(input)
    from urllib.parse import urlparse
    import re
    urls = re.findall(r'https?://[^\s<>"]+', input_text)
    for url in urls:
        domain = urlparse(url).netloc
        if domain and domain not in ALLOWED_DOMAINS:
            return GuardrailFunctionOutput(
                tripwire_triggered=True,
                output_info={"blocked_domain": domain},
            )
    return GuardrailFunctionOutput(tripwire_triggered=False)

safe_agent = Agent(
    name="Safe Researcher",
    instructions="You research books and quotes on approved websites only.",
    tools=[browse, extract_books, close_browser],
    model="gpt-5.4",
    input_guardrails=[url_allowlist_check],
)

When a user asks the agent to visit a domain not in ALLOWED_DOMAINS, the guardrail fires a tripwire and the run stops immediately. The agent never sees the request.

This matters for production agents. Without URL restrictions, a user could ask your agent to visit any site, including competitors, internal tools, or malicious URLs. For more on securing browser agents, see the security best practices guide.

Rate Limiting Browser Calls

Browser sessions are more expensive than text generation. A rate-limiting guard prevents runaway tool calls:

import time

call_log = []

@function_tool
def rate_limited_browse(url: str) -> str:
    """Navigate to a URL with rate limiting (max 5 calls per minute).

    Args:
        url: The URL to visit.
    """
    now = time.time()
    recent = [t for t in call_log if now - t < 60]
    if len(recent) >= 5:
        return "Rate limit reached. Wait before making more browser requests."
    call_log.append(now)
    call_log[:] = recent + [now]

    global session
    if session:
        session.goto(url)
    else:
        session = bb.sessions.create(url=url, timeout=120)
    if session.page and session.page.markdown:
        content = session.page.markdown.content
        return content[:5000] if len(content) > 5000 else content
    return "Page loaded but no content available."

Output Validation

An output guardrail that checks whether the agent's response contains actual data, not a hallucinated summary:

from agents import output_guardrail, GuardrailFunctionOutput

@output_guardrail
async def data_quality_check(ctx, agent, output):
    """Verify the agent's output contains structured data, not just prose."""
    text = output.output if hasattr(output, 'output') else str(output)
    if len(text) < 50:
        return GuardrailFunctionOutput(
            tripwire_triggered=True,
            output_info={"reason": "Output too short, likely missing data"},
        )
    return GuardrailFunctionOutput(tripwire_triggered=False)

validated_agent = Agent(
    name="Validated Researcher",
    instructions="You research books on websites. Always include specific titles and prices in your response.",
    tools=[browse, extract_books, close_browser],
    model="gpt-5.4",
    input_guardrails=[url_allowlist_check],
    output_guardrails=[data_quality_check],
)

Input and output guardrails stack. Use input guardrails for access control and safety, output guardrails for data quality.

Guardrail Types at a Glance

Guardrail Type	Decorator	Runs When	Best For
Input	`@input_guardrail`	Before the agent processes the request	URL allowlisting, content filtering, rate limiting
Output	`@output_guardrail`	After the agent produces a response	Data quality checks, format validation, PII detection
Tool Input	`@tool_input_guardrail`	Before a specific tool executes	Blocking dangerous tool arguments
Tool Output	`@tool_output_guardrail`	After a specific tool executes	Redacting sensitive data from tool results

Advanced Patterns

Multi-Agent Handoffs

The Agents SDK supports multi-agent workflows where one agent hands off work to another. For browser tasks, a common pattern is splitting the work into a researcher (decides what to browse) and an extractor (pulls specific data).

from agents import Agent, Runner

extractor_agent = Agent(
    name="Data Extractor",
    instructions="""You are a data extraction specialist. When given a URL and extraction instructions:
1. Browse to the URL
2. Extract the requested data
3. Return it as a clean JSON array
4. Close the browser""",
    tools=[browse, extract_books, close_browser],
    model="gpt-5.4",
)

researcher_agent = Agent(
    name="Research Coordinator",
    instructions="""You coordinate research tasks. When asked to gather data:
1. Figure out which URLs to visit
2. Hand off to the Data Extractor for each URL
3. Compile the results into a summary""",
    handoffs=[extractor_agent],
    model="gpt-5.4",
)

result = Runner.run_sync(
    researcher_agent,
    "Compare the book prices on pages 1 and 2 of books.toscrape.com"
)
print(result.final_output)

The researcher agent has no browser tools. It coordinates the work and hands off to the extractor agent when it needs data from a page. The SDK manages the conversation handoff, including passing context between agents.

Session Reuse Across Turns

For multi-turn conversations, reuse the Browserbeam session instead of creating a new one for each tool call. The agent can browse, ask a follow-up question, browse more, and the session stays active:

@function_tool
def browse_or_continue(url: str = "") -> str:
    """Navigate to a URL or continue browsing the current page.

    Args:
        url: The URL to visit. Leave empty to get the current page content.
    """
    global session
    if url and session:
        session.goto(url)
    elif url:
        session = bb.sessions.create(url=url, timeout=300)
    elif not session:
        return "No page open. Provide a URL."

    if session.page and session.page.markdown:
        return session.page.markdown.content[:5000]
    return "No content available."

Set a longer timeout (300 seconds here) for multi-turn sessions. Browserbeam auto-closes sessions that exceed the timeout, which is your safety net against orphaned sessions. Close sessions explicitly when done. Open sessions keep the billing clock running.

Streaming Agent Responses

For real-time feedback, use Runner.run_streamed instead of Runner.run_sync. This returns events as the agent works, so you can show progress to the user:

import asyncio
from agents import Agent, Runner

async def stream_agent():
    agent = Agent(
        name="Streaming Researcher",
        instructions="Research books and stream your findings as you go.",
        tools=[browse, extract_books, close_browser],
        model="gpt-5.4",
    )

    result = Runner.run_streamed(agent, "What books are on books.toscrape.com?")
    async for event in result.stream_events():
        if event.type == "raw_response_event":
            if hasattr(event.data, "delta") and hasattr(event.data.delta, "text"):
                print(event.data.delta.text, end="", flush=True)
    print()

asyncio.run(stream_agent())

Streaming is useful for long-running research tasks where the agent visits multiple pages. The user sees progress instead of waiting for the full result.

OpenAI Agents SDK vs Raw Function Calling vs LangChain

Three ways to build an AI agent with browser access. Each has different tradeoffs:

Feature	Raw `openai` Library	OpenAI Agents SDK	LangChain + LangGraph
Tool dispatch loop	Manual (you write it)	Automatic	Automatic
Tool schema generation	Manual JSON schema	Auto from type hints	Auto from `@tool` decorator
Guardrails	Build your own	Built-in (`@input_guardrail`)	LangGraph interrupts
Multi-agent handoffs	Build your own	Built-in (`handoffs=`)	LangGraph `StateGraph`
Streaming	Manual SSE parsing	Built-in (`run_streamed`)	LangGraph streaming
Model support	OpenAI only (native)	OpenAI + any via adapters	Any via ChatModel classes
Framework dependency	`openai` only	`openai-agents`	`langchain` + `langgraph`
Lines of code (browser agent)	~150	~60	~80

When to Use Each

Raw openai library: You want full control over the agent loop, or you need custom behavior that the SDK doesn't support. Good for production systems where you've already built the infrastructure. See the LLM-powered browser automation tutorial for this approach.

OpenAI Agents SDK: You want the fastest path from idea to working agent with OpenAI models. Guardrails, handoffs, and streaming work out of the box. Good for new projects and teams already using OpenAI. This is what this guide covers.

LangChain + LangGraph: You need to swap between model providers (OpenAI, Anthropic, Google) or want LangGraph's state machine patterns for complex multi-step workflows. Good for teams that need vendor flexibility. See the Browserbeam + LangChain integration guide for this approach.

Migrating from Raw Function Calling

If you built a browser agent with the raw openai library, migrating to the Agents SDK means:

Replace your tool dispatch loop with Runner.run_sync()
Convert your tool functions to use @function_tool (add type hints and docstrings)
Move your system prompt to Agent(instructions=...)
Replace manual history management with the SDK's built-in conversation tracking
Add guardrails where you previously had manual input validation

The code shrinks by about 60%. The behavior stays the same.

Common Mistakes

Five patterns that trip up teams building browser agents with the OpenAI Agents SDK. All avoidable.

Not Closing Browser Sessions

Every Browserbeam session consumes resources until it's closed or times out. If your tool functions create sessions but don't close them (especially on error paths), you'll accumulate orphaned sessions.

The fix: always include a close_browser tool and instruct the agent to call it. Set a timeout on every session as a safety net.

session = bb.sessions.create(url=url, timeout=60)
try:
    result = session.extract(books=[{"_parent": "article.product_pod", "title": "h3 a >> text"}])
    return json.dumps(result.extraction.get("books", []))
finally:
    session.close()

Passing Raw HTML to the LLM

Browserbeam returns structured markdown, not raw HTML. But if you're tempted to pass session.page.markdown.content without truncation, a large page will consume your entire context window.

The fix: truncate observe results to 3,000-5,000 characters. Use extract for structured data instead of asking the LLM to parse page content.

Missing Error Handling in Tool Functions

When a tool function throws an exception, the Agents SDK returns the error as a tool result. The model sees the error and may retry or give up. Unhandled exceptions produce ugly error messages that confuse the model.

The fix: wrap tool functions in try/except and return clear error messages:

@function_tool
def safe_browse(url: str) -> str:
    """Navigate to a URL safely.

    Args:
        url: The URL to visit.
    """
    try:
        global session
        session = bb.sessions.create(url=url, timeout=60)
        if session.page and session.page.markdown:
            return session.page.markdown.content[:5000]
        return "Page loaded but no content available."
    except Exception as e:
        return f"Error browsing {url}: {str(e)}"

Letting the Agent Navigate Unrestricted

Without URL guardrails, the agent can visit any site. In testing, this seems fine. In production, a user could direct the agent to visit internal tools, leak data to external sites, or trigger security alerts.

The fix: always add an input guardrail that restricts the agent to an approved domain list (see the "Adding Guardrails" section above).

Forgetting to Set Timeouts

Browser sessions without timeouts stay alive until you close them or the system eventually reclaims them. If your agent crashes mid-run, those sessions become orphans that consume resources.

The fix: set timeout on every sessions.create() call. Use a value that's 3-5x your expected session duration. For a quick extraction (5 seconds), set timeout=30. For a multi-step workflow (30 seconds), set timeout=120.

Frequently Asked Questions

What is the OpenAI Agents SDK?

The OpenAI Agents SDK is a Python framework (pip install openai-agents) for building AI agents with OpenAI models. It handles the tool dispatch loop, multi-agent handoffs, guardrails, and streaming. You define agents and tools; the SDK runs the agent loop. It uses the Responses API by default and supports GPT-5.4 as the recommended model.

How does OpenAI function calling work with browser tools?

You define Python functions that wrap Browserbeam operations (navigate, extract, click, fill) and decorate them with @function_tool. The SDK generates JSON schemas from your type hints and passes them to the model. When the model decides to call a tool, the SDK executes your function and returns the result. The model never touches the browser directly.

Can I use the OpenAI Agents SDK with models other than GPT?

Yes. The SDK supports non-OpenAI models through the Chat Completions adapter. Call set_default_openai_api("chat_completions") and configure any OpenAI-compatible endpoint. Some SDK features (tool search, deferred tools) are Responses API-only and won't work with third-party models.

What is the difference between the Responses API and Chat Completions?

The Responses API is OpenAI's newer interface, designed for agent and tool use workflows. It supports features like tool search and deferred tool loading. Chat Completions is the older interface that most developers know from client.chat.completions.create(). The Agents SDK uses Responses by default but supports both.

How do I add custom tools to an OpenAI agent?

Decorate a Python function with @function_tool from the agents module. Add type hints to all parameters and a docstring that describes what the tool does. Pass the function in the tools=[] list when creating your Agent. The SDK handles schema generation and tool dispatch automatically.

Do I need to manage a browser to give my agent web access?

No. Browserbeam runs browsers in the cloud. Your agent sends API calls through the Browserbeam Python SDK, and Browserbeam handles browser lifecycle, crash recovery, and scaling. No Chromium to install, no WebDriver to configure. See the Browserbeam API docs for the full reference.

How do guardrails work in the OpenAI Agents SDK?

Guardrails are async functions decorated with @input_guardrail or @output_guardrail. Input guardrails check the user's request before the agent processes it. Output guardrails check the agent's response before returning it. Both return GuardrailFunctionOutput with a tripwire_triggered flag. If triggered, the run stops with an error.

Can I use Browserbeam with LangChain instead?

Yes. Browserbeam works with any Python framework. For LangChain, wrap Browserbeam methods as LangChain @tool functions and use them with create_react_agent from LangGraph. See the Browserbeam + LangChain integration guide for a full tutorial.

What to Build Next

You have the pieces: browser tools, a research agent, guardrails, multi-agent handoffs, and streaming. The next step is combining them into something useful.

A competitive intelligence agent that monitors competitor pricing across ten sites. A lead enrichment agent that visits company websites and extracts contact information. A content monitoring agent that checks for changes on pages your team cares about. Each of these is the same pattern: define the tools, write the agent instructions, add guardrails, run it.

Start with the Browserbeam API docs for the full API reference. Scale your agent with the patterns in the scaling web automation guide. Lock down access with the security best practices guide. Build your first agent with the intelligent web agents tutorial.

What will your agent browse first?