Control real browsers through a simple REST API. Get structured page data, stable element refs, and change diffs instead of raw HTML.
Six steps showing the full lifecycle: create a session, observe the page, fill a search form, extract results with AI selectors, scroll for more, and screenshot.
{
"url": "https://browserbeam.com/blog/",
"viewport": {
"width": 1280,
"height": 720
},
"auto_dismiss_blockers": true
}
{
"session_id": "ses_abc123def456",
"expires_at": "2026-04-14T14:05:00.000Z",
"request_id": "req_8f3a2bc1d4e5",
"completed": 0,
"page": {
"url": "https://browserbeam.com/blog/",
"title": "Browserbeam Blog",
"stable": true,
"markdown": {
"content": "### Puppeteer vs Playwright vs Browserbeam...\n\n### Build a Competitive Intelligence Agent..."
},
"interactive_elements": [
{ "ref": "e1", "tag": "input",
"label": "Search articles...", "in": "form", "form": "f1" }
],
"forms": [
{ "ref": "f1", "action": "/blog/", "method": "GET", "fields": ["e1"] }
]
},
"blockers_dismissed": ["cookie_consent"]
}
Launch Puppeteer, set viewport, navigate, wait for networkidle, detect and dismiss cookie banner (2-3 extra actions), call page.content(), parse 15,000+ character HTML with cheerio, manually extract form fields.
~25 lines of code. A wall of raw HTML for your LLM to parse.
One POST request. Navigate, auto-dismiss the cookie banner, return markdown content, element refs, and form structures. The page is ready for your agent to read and act on.
1 API call. Markdown + refs + forms. Compact and LLM-ready.
{
"steps": [
{
"observe": {
"scope": "main",
"format": "markdown"
}
}
]
}
{
"session_id": "ses_abc123def456",
"request_id": "req_c7d41e8f2a09",
"completed": 1,
"page": {
"url": "https://browserbeam.com/blog/",
"stable": true,
"markdown": {
"content": "### Puppeteer vs Playwright vs Browserbeam: An Honest Comparison (2026)\n\n22 min read\n\n### Build a Competitive Intelligence Agent...",
"length": 4820
},
"interactive_elements": [
{ "ref": "e1", "tag": "input",
"label": "Search articles...", "in": "form", "form": "f1" }
]
}
}
Call page.content() for the full DOM (15,000+ characters), then use cheerio or regex to extract just the section you need. Convert HTML to markdown yourself. No way to detect what changed since your last read.
A bloated payload for a single page read. No scoping. No diff.
Scope to a CSS selector, get back clean markdown and element refs for just that section. The changes field shows what shifted since the last observation, so your agent never re-reads stale content.
Scoped and compact. Markdown built in. Diff tracking automatic.
{
"steps": [
{
"fill_form": {
"fields": {
"Search articles...": "agent"
},
"submit": true
}
}
]
}
{
"session_id": "ses_abc123def456",
"request_id": "req_e2b8c5f19d34",
"completed": 1,
"page": {
"url": "https://browserbeam.com/blog/?q=agent",
"title": "Browserbeam Blog",
"stable": true,
"changes": {
"url_changed": true,
"content_changed": true,
"elements_added": [
{ "ref": "e2", "tag": "a" }
]
},
"markdown": {
"content": "### Browserbeam & LangChain: How to Build AI Agents...\n\n### OpenAI Agents SDK + Browserbeam...\n\n### The Future of AI Agents..."
}
}
}
Find the search input by CSS selector (breaks if markup changes), page.type() the value, find the submit button, page.click(), waitForNavigation(), then re-scrape the entire page to see results.
~15 lines. Fragile selectors. A full page re-read just to see what changed.
One step matches the search field by label, fills it, submits the form, and waits for results. The response includes the new page state with a diff showing what changed.
1 API call. 0 selectors. Change diff included automatically.
{
"steps": [
{
"extract": {
"posts": [
{
"_parent": "ai >> a blog post card",
"title": "ai >> the post title",
"url": "ai >> the link to the full post",
"excerpt": "ai >> the short description"
}
]
}
}
]
}
{
"session_id": "ses_abc123def456",
"request_id": "req_a1d9f3c72b60",
"completed": 1,
"extraction": {
"posts": [
{
"title": "Browserbeam & LangChain: How to Build AI Agents with Browser Access",
"url": "https://browserbeam.com/blog/browserbeam-langchain-integration/",
"excerpt": "Build LangChain agents with browser access using Browserbeam..."
},
{
"title": "OpenAI Agents SDK + Browserbeam: Give Your Agent Eyes on the Web",
"url": "https://browserbeam.com/blog/openai-agents-sdk-browserbeam/",
"excerpt": "Build OpenAI Agents SDK tools that browse, click, and extract..."
},
// ... 10 more posts
]
}
}
Open DevTools, hunt for the right selector for each field, paste them into your code, ship. Then watch them break the next time the site author renames a class. Maintain a CSS map per site, forever.
~20 lines of in-page JavaScript. Fragile selectors. No caching.
Use ai >> to say what you want. The engine finds the selector, caches it per domain, and every scrape after that is free. Works on SPAs, hashed classes, anything.
1 declarative schema. AI-resolved selectors. Cached for free reuse.
{
"steps": [
{
"scroll_collect": {
"max_text_length": 50000,
"max_scrolls": 10
}
}
]
}
{
"session_id": "ses_abc123def456",
"request_id": "req_f4a82d1c0e57",
"completed": 1,
"page": {
"url": "https://browserbeam.com/blog/?q=agent",
"title": "Browserbeam Blog",
"stable": true,
"markdown": {
"content": "### Browserbeam & LangChain...\n\n### OpenAI Agents SDK + Browserbeam...\n\n### The Future of AI Agents...",
"length": 12480
},
"scroll": {
"y": 8400,
"height": 8400,
"percent": 100
}
}
}
Write a scroll loop: scroll down, wait for lazy content to load, check if you've reached the bottom, repeat. Handle race conditions with loading spinners and infinite scroll triggers. Collect content at each position, deduplicate.
~35 lines. Fragile timing. Content deduplication is your problem.
One step. Scrolls through the entire page, waits for lazy-loaded content at each position, deduplicates, and returns a single unified markdown observation. Handles infinite scroll, loading spinners, and content gates.
1 step. Full page content in one response. Up to 50,000 characters.
{
"steps": [
{
"screenshot": {
"full_page": true,
"format": "png"
}
},
{
"close": {}
}
]
}
{
"session_id": "ses_abc123def456",
"request_id": "req_d5c83e7f1b92",
"completed": 2,
"media": [
{
"type": "screenshot",
"format": "png",
"data": "iVBORw0KGgo..."
}
]
}
page.screenshot() to a temp file, read the file into a buffer, base64-encode it, browser.close(), handle cleanup errors if the process crashed. You manage Chrome process lifecycle yourself.
~12 lines. Must manage file I/O and process cleanup.
Screenshot and close in a single call. Base64 image data returned inline. The close step destroys the session and stops credit consumption. No cleanup code needed.
1 API call, 2 steps. Session cleaned up automatically.
Nine capabilities that sit between your agent and the page, so the LLM spends tokens on the task, not on browser overhead.
Every response includes a stability signal that tells your agent when the page is fully loaded and ready. No more guessing wait times or burning tokens on premature reads.
Interactive elements get short, stable refs like e1, e2, e3. Your agent clicks by ref instead of constructing fragile CSS selectors.
After each action, the API returns only what changed: elements added, removed, or modified. Your agent reads a 30-token diff instead of re-parsing the entire page.
Cookie banners, newsletter popups, and chat widgets are detected and dismissed automatically. Your agent never wastes actions on interruptions irrelevant to the task.
Pages are compressed into a structured, token-efficient representation: interactive elements, headings, and visible text. Thousands of DOM nodes become a compact JSON object.
When an action fails, you get context, not just "element not found." The API tells you if an overlay is blocking the target, if a CAPTCHA appeared, and what to do next.
Run custom JavaScript on any page when built-in steps aren't enough. Your agent writes a JS snippet and the API executes it in the browser context, returning the result as structured data.
Inject cookies at session creation to skip login flows entirely. Your agent authenticates once, saves the cookies, and resumes authenticated sessions instantly.
Wait for JavaScript expressions to become truthy, not just DOM selectors. Your agent handles complex SPAs where visibility depends on framework state, not raw DOM presence.
One API, many possibilities. From autonomous agents to data pipelines, Browserbeam gives your code a browser it can see through.
Give your AI agent a real browser it can see and control. Every response includes markdown, stable element refs, and optional context (landmark, nearby heading, and parent form), plus forms grouped with their field refs.
POST /v1/sessions
{
"url": "https://en.wikipedia.org/wiki/Web_scraping",
"auto_dismiss_blockers": true
}
{
"session_id": "ses_abc123def456",
"request_id": "req_8f3a2bc1d4e5",
"page": {
"url": "https://en.wikipedia.org/wiki/Web_scraping",
"title": "Web scraping - Wikipedia",
"stable": true,
"markdown": {
"content": "**Web scraping**, **web harvesting**, or **web data extraction** is data scraping used for extracting data from websites..."
},
"interactive_elements": [
{ "ref": "e3", "tag": "input",
"label": "Search Wikipedia", "in": "form", "form": "f1" },
{ "ref": "e4", "tag": "button",
"label": "Search", "in": "form", "form": "f1" }
]
}
}
POST /v1/sessions
{
"url": "https://books.toscrape.com",
"steps": [{
"extract": {
"products": [{
"_parent": "article.product_pod",
"_limit": 3,
"name": "ai >> the book title",
"price": "ai >> the price including £",
"stock": "ai >> the availability text"
}]
}
}, {
"close": {}
}]
}
{
"session_id": "ses_def789abc012",
"request_id": "req_b4e2c7d81f30",
"completed": 2,
"extraction": {
"products": [
{
"name": "A Light in the Attic",
"price": "£51.77",
"stock": "In stock"
},
{
"name": "Tipping the Velvet",
"price": "£53.74",
"stock": "In stock"
},
{
"name": "Soumission",
"price": "£50.10",
"stock": "In stock"
}
]
}
}
POST /v1/sessions
{
"url": "https://quotes.toscrape.com/login",
"steps": [{
"observe": {
"scope": "form",
"format": "markdown"
}
}]
}
{
"session_id": "ses_abc123def456",
"request_id": "req_91c4a8e72d05",
"completed": 1,
"page": {
"url": "https://quotes.toscrape.com/login",
"title": "Quotes to Scrape",
"stable": true,
"markdown": {
"content": "# Login\n\nUsername\nPassword"
},
"interactive_elements": [
{ "ref": "e1", "tag": "input",
"label": "Username", "in": "form", "form": "f1" },
{ "ref": "e2", "tag": "input",
"in": "form", "form": "f1" },
{ "ref": "e3", "tag": "input",
"in": "form", "form": "f1" }
],
"forms": [
{ "ref": "f1", "action": "/login", "method": "POST", "fields": ["e1", "e2", "e3"] }
]
}
}
POST /v1/sessions
{
"url": "https://hn.algolia.com",
"steps": [
{ "fill_form": {
"fields": {
"Search stories by title, url or author": "browser automation"
},
"submit": true
}},
{ "wait": { "ms": 2000 }},
{ "screenshot": {
"full_page": true
}},
{ "close": {} }
]
}
{
"session_id": "ses_ghi012jkl345",
"request_id": "req_73b9e4f02c18",
"completed": 4,
"page": {
"url": "https://hn.algolia.com/?q=browser+automation",
"title": "browser automation | Search powered by Algolia",
"stable": true,
"changes": {
"content_changed": true,
"url_changed": true
}
},
"media": [{
"type": "screenshot",
"format": "png",
"data": "iVBORw0KGgo..."
}]
}
Official SDKs for the languages you already use, plus an MCP server that turns Browserbeam into tools your AI coding assistant can call.
pip install, full type hints, sync and async clients.
npm install, full TypeScript types, ESM and CJS builds.
gem install, block-based sessions, Struct-based types.
Use as tools in Cursor, Claude Desktop, and Windsurf.
Browserbeam is a REST API. Any language that can make HTTP requests can use it.
Browse All IntegrationsOne monthly credit pool covers runtime, proxies, AI selectors, and CAPTCHA solving. Start with 5,000 free credits, no card required.
For individuals and side projects
For teams and production use
For agencies and high-volume use
Drag the sliders to match your workload. We'll show you which plan fits and how many credits you'll burn each month.
Each resolution averages ~600 AI tokens (15 credits / 1K tokens).
Auto-solved CAPTCHAs — DataDome, Cloudflare, reCAPTCHA, etc. (75 credits / solve).
Structured page data instead of raw HTML. Your agent processes less, decides faster, and costs less to run.