Your AI coding assistant can open a browser, navigate to a page, click buttons, fill forms, and extract structured data. Not through a custom plugin or a brittle shell command, but through the Model Context Protocol. One JSON config, one MCP server, and your agent in Cursor, Claude Desktop, or Windsurf gets full browser access.
The Model Context Protocol (MCP) gives AI coding assistants a standard way to call external tools. Browserbeam's MCP server turns those tool calls into real browser sessions, returning structured markdown instead of raw HTML. The result: your agent reasons about clean page content instead of burning tokens on DOM noise.
This guide covers the full stack. You'll set up the Browserbeam MCP server, connect it to your IDE, run your first browser task, and build production-ready agent patterns. By the end, your AI assistant will browse the web as naturally as it reads files.
In this guide, you'll learn:
- What the Model Context Protocol is and how MCP servers work
- How Browserbeam's MCP server translates tool calls into browser sessions
- How to configure the MCP server for Cursor, Claude Desktop, and Windsurf
- Agent patterns for observe-act-extract workflows via MCP tools
- Real-world use cases: QA testing, documentation generation, competitive research
- Common integration mistakes and how to avoid them
- How Browserbeam MCP compares to Playwright MCP and Puppeteer MCP
TL;DR: The Model Context Protocol lets AI coding assistants call external tools through a standard interface. Browserbeam's MCP server (@browserbeam/mcp-server) gives your agent 18 browser automation tools that return structured markdown, element refs, and extraction results instead of raw HTML. Set it up in Cursor, Claude Desktop, or Windsurf with a single JSON config.
Understanding the Model Context Protocol (MCP)
Before connecting your AI assistant to a browser, you need to understand what sits between them. The Model Context Protocol is that layer, and getting it right determines whether your agent can actually use browser tools or just knows they exist.
What Is an MCP Server and Why It Exists
An MCP server is a lightweight process that exposes tools, resources, and prompts to AI clients through a standardized protocol. Think of it as a USB port for AI assistants. Instead of each IDE building custom integrations for every external service, MCP provides one interface that works everywhere.
The problem MCP solves is fragmentation. Before MCP, connecting an AI coding assistant to an external tool meant writing a custom plugin for each IDE. A Cursor extension, a Claude Desktop integration, a Windsurf adapter. Three implementations of the same thing. MCP replaces all of that with one server process and one config format.
| Concept | What It Does |
|---|---|
| MCP Server | Process that exposes tools to AI clients via a standard protocol |
| MCP Client | The AI application (Cursor, Claude Desktop, Windsurf) that connects to servers |
| Tools | Functions the AI can call (e.g., browserbeam_navigate, browserbeam_extract) |
| Resources | Read-only data the server provides (files, database records, API responses) |
| Prompts | Pre-built prompt templates the server suggests to the client |
MCP vs Direct API Calls
You might wonder why your agent needs MCP at all. Why not just call the Browserbeam API directly?
Direct API calls work, but they require the AI model to know the full API surface: endpoints, authentication headers, request bodies, error codes. The model must generate valid HTTP requests and parse JSON responses. That's a lot of context window spent on plumbing.
MCP flips this. The server advertises its tools with typed schemas. The AI client discovers what's available and presents those tools to the model as callable functions. The model calls browserbeam_observe with a session ID, and the MCP infrastructure handles the HTTP request, authentication, and response formatting. Your agent never constructs a curl command or parses a raw JSON response.
The practical difference: fewer tokens spent on boilerplate, fewer errors from malformed requests, and a consistent interface across every AI client that supports MCP.
The MCP Ecosystem in 2026
MCP adoption grew fast. Anthropic open-sourced the protocol specification, and the major AI coding assistants added support within months. As of 2026, MCP clients include:
- Cursor (via
~/.cursor/mcp.json) - Claude Desktop (via
claude_desktop_config.json) - Windsurf (via
mcp_config.json) - VS Code with compatible extensions
- Custom agents built with the MCP SDK
The server ecosystem covers databases, file systems, APIs, and now browser automation. Browserbeam's MCP server is one of over 100 community and vendor-built MCP servers listed in the official registry.
How MCP Architecture Works
Understanding the architecture helps you debug connection issues and build better agent workflows. The protocol has three layers: transport, discovery, and execution.
Transport Layer: stdio vs SSE
MCP supports two transport mechanisms. The Browserbeam MCP server uses stdio (standard input/output), which is the simpler and more common option.
| Transport | How It Works | When to Use |
|---|---|---|
| stdio | Client spawns the server process and communicates via stdin/stdout | Local IDE integrations (Cursor, Claude Desktop, Windsurf) |
| SSE | Client connects to a remote HTTP endpoint with Server-Sent Events | Remote/cloud MCP servers, shared team setups |
With stdio, the IDE spawns npx -y @browserbeam/mcp-server as a child process. Messages flow through stdin and stdout as JSON-RPC. No network configuration, no firewall rules, no port management. The server starts when you open your IDE and stops when you close it.
SSE transport connects to a remote HTTP endpoint. This is useful for shared team servers or cloud-hosted MCP setups, but adds network complexity. For browser automation through Browserbeam, stdio is the right choice because the MCP server is just a thin translation layer. The actual browser runs on Browserbeam's cloud infrastructure regardless of which transport you pick.
Tool Discovery and Schema
When an MCP client connects to a server, the first thing it does is ask: "What tools do you have?" The server responds with a list of tool names, descriptions, and JSON Schema definitions for each tool's parameters.
This is what the Browserbeam MCP server returns for the browserbeam_observe tool:
{
"name": "browserbeam_observe",
"description": "Re-read the current page state...",
"inputSchema": {
"type": "object",
"properties": {
"session_id": { "type": "string" },
"scope": { "type": "string" },
"format": { "type": "string", "enum": ["markdown", "html"] },
"mode": { "type": "string", "enum": ["main", "full"] },
"include_page_map": { "type": "boolean" },
"max_text_length": { "type": "number" }
},
"required": ["session_id"]
}
}
The AI client reads this schema and knows exactly how to call the tool. The model sees browserbeam_observe as a function it can invoke, just like reading a file or running a terminal command. No prompt engineering needed to teach the model about API endpoints.
Resource and Prompt Primitives
Beyond tools, MCP defines two other primitives:
Resources are read-only data the server exposes. A database MCP server might expose table schemas as resources. A file system server exposes directory listings. The Browserbeam MCP server focuses on tools rather than resources, because browser interactions are inherently action-oriented.
Prompts are pre-built prompt templates the server suggests. A code review MCP server might offer a "review this PR" prompt template. These are optional and most browser automation workflows don't need them since the AI model already knows how to compose browser tasks from tool descriptions.
Browserbeam as an MCP Backend for Browser Tasks
The Browserbeam MCP server bridges MCP tool calls to the Browserbeam API. Your AI coding assistant calls MCP tools. The server translates those calls into HTTPS requests. Real browsers run on Browserbeam's infrastructure. Structured results flow back through the MCP protocol.
(Cursor / Claude / Windsurf)
browserbeam_observe
POST /v1/sessions/:id/act
Chrome on cloud infra
Markdown + refs + diffs
text content
JSON response
DOM + network idle
What the Browserbeam MCP Server Exposes
The @browserbeam/mcp-server package (v0.4.0) exposes 18 MCP tools. Each tool maps to a Browserbeam API operation:
| Tool | What It Does |
|---|---|
browserbeam_create_session |
Create a browser session, optionally navigate to a URL |
browserbeam_navigate |
Navigate to a new URL in an existing session |
browserbeam_observe |
Get page content as markdown with interactive element refs |
browserbeam_click |
Click an element by ref, text, or label |
browserbeam_fill |
Fill form fields or an entire form at once |
browserbeam_type |
Type text character-by-character with real keyboard events |
browserbeam_select |
Select an option from a dropdown |
browserbeam_check |
Check or uncheck a checkbox or radio button |
browserbeam_scroll |
Scroll the page or scroll an element into view |
browserbeam_scroll_collect |
Scroll to load lazy content, then return the full page |
browserbeam_wait |
Wait for a selector, text, JS expression, or fixed delay |
browserbeam_extract |
Extract structured data using a declarative JSON schema |
browserbeam_execute_js |
Run custom JavaScript in the browser page context |
browserbeam_screenshot |
Take a screenshot of the current page |
browserbeam_pdf |
Generate a PDF of the current page |
browserbeam_upload |
Upload files to a file input element |
browserbeam_list_sessions |
List your active browser sessions |
browserbeam_get_session |
Get the status and metadata of a session |
browserbeam_close |
Close a session and release resources |
That's a full browser automation toolkit available as MCP tools. Your AI assistant can do anything a developer does in a browser, without writing Puppeteer scripts or managing Chrome processes.
Why Browser Tasks Need an MCP Server
Browser automation is the missing piece for AI coding assistants. Your agent can read files, run terminal commands, and search codebases. But when it needs to check a deployed app, test a form submission, scrape reference data, or verify a UI change, it has no eyes.
An MCP server for browser automation solves this by treating browser actions as first-class tools. The agent doesn't need to know about HTTP endpoints, authentication tokens, or response parsing. It calls browserbeam_create_session, gets back structured markdown, and makes decisions.
The alternative is running a local Playwright or Puppeteer instance through shell commands. That means managing browser binaries, handling crashes, and parsing raw HTML output. The MCP approach is cleaner: the complexity lives in the server, and the agent gets structured responses.
The Tool-to-API Translation Layer
The Browserbeam MCP server is a thin Node.js process. It receives JSON-RPC messages over stdio, maps them to Browserbeam API endpoints, and formats the responses for the AI client.
Here's what happens when the agent calls browserbeam_click:
- The MCP client sends a JSON-RPC
tools/callmessage withname: "browserbeam_click"and arguments{ session_id: "ses_abc123", ref: "e5" } - The MCP server sends
POST /v1/sessions/ses_abc123/actwith body{ steps: [{ action: "click", ref: "e5" }] }to the Browserbeam API - Browserbeam executes the click in a real browser and returns the updated page state
- The MCP server formats the response as text content (markdown, element refs, changes) and sends it back through stdio
The agent never sees HTTP. It calls tools and reads results.
Setting Up the Browserbeam MCP Server
Getting the MCP server running takes under two minutes. You need a Browserbeam API key and one JSON config file.
Configuring the MCP Server
The Browserbeam MCP server runs through npx, so there's nothing to install globally. Your IDE spawns it automatically when it reads the config.
First, sign up for a free Browserbeam account to get your API key. Then add the server config to your IDE.
Cursor (~/.cursor/mcp.json):
{
"mcpServers": {
"browserbeam": {
"command": "npx",
"args": ["-y", "@browserbeam/mcp-server"],
"env": {
"BROWSERBEAM_API_KEY": "your_api_key_here"
}
}
}
}
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"browserbeam": {
"command": "npx",
"args": ["-y", "@browserbeam/mcp-server"],
"env": {
"BROWSERBEAM_API_KEY": "your_api_key_here"
}
}
}
}
Windsurf (~/.codeium/windsurf/mcp_config.json):
{
"mcpServers": {
"browserbeam": {
"command": "npx",
"args": ["-y", "@browserbeam/mcp-server"],
"env": {
"BROWSERBEAM_API_KEY": "your_api_key_here"
}
}
}
}
The config is identical across all three clients. The command and args tell the IDE to spawn the MCP server via npx. The -y flag auto-confirms the package install. The env block passes your API key to the server process.
Connecting to Cursor, Claude Desktop, or Windsurf
After saving the config file, restart your IDE. The MCP client detects the new server and spawns the process automatically.
In Cursor, you'll see browserbeam listed in the MCP servers panel. The agent gains access to all 18 browser tools immediately. Ask your Cursor agent to "open https://books.toscrape.com and tell me what's on the page," and it will call browserbeam_create_session behind the scenes.
In Claude Desktop, the tools appear in the tool list when you start a new conversation. Claude can use them just like its built-in computer use tools, but with structured markdown output instead of screenshots.
In Windsurf, the integration works the same way. The agent discovers the tools through MCP's standard discovery protocol and presents them as available actions.
If the server fails to start, check these common causes:
- Node.js not installed: The server requires Node.js 18 or later
- Missing API key: The server exits immediately if
BROWSERBEAM_API_KEYis not set - npx not in PATH: Make sure
npxis available in the shell your IDE uses
Running Your First Browser Task via MCP
With the server connected, ask your AI assistant to perform a browser task. Here's a simple example you can try right now:
"Go to https://news.ycombinator.com and extract the top 5 story titles with their URLs."
Behind the scenes, the agent will:
- Call
browserbeam_create_sessionwithurl: "https://news.ycombinator.com" - Read the structured markdown response to understand the page
- Call
browserbeam_extractwith a schema like{"stories": [{"_parent": ".titleline", "_limit": 5, "title": "a >> text", "url": "a >> href"}]} - Return the extracted data in a clean format
The agent handles all of this autonomously. You describe the goal in natural language. The MCP server and Browserbeam handle the execution.
Agent Patterns with MCP + Browserbeam
Once the MCP server is connected, building effective agent workflows comes down to patterns. These three cover most browser automation use cases.
Observe-Act-Extract via MCP Tools
The core pattern for MCP browser automation is a three-step loop:
- Observe: Call
browserbeam_observeorbrowserbeam_create_sessionto get the current page state - Act: Based on the page content, call
browserbeam_click,browserbeam_fill, orbrowserbeam_navigate - Extract: Call
browserbeam_extractto pull structured data, or read the observation response directly
This maps naturally to how AI agents reason. The model reads the page (observe), decides what to do (reason), takes an action (act), and checks the result (observe again).
A multi-step workflow chains these loops. For example, logging into a site and pulling dashboard data:
# The agent executes this sequence via MCP tool calls:
# 1. browserbeam_create_session(url="https://books.toscrape.com")
# 2. browserbeam_observe(session_id=sid)
# 3. browserbeam_wait(session_id=sid, text="Dashboard")
# 4. browserbeam_observe(session_id=sid)
# 5. browserbeam_extract(session_id=sid, schema='{"metrics": [...]}')
# 6. browserbeam_close(session_id=sid)
Each step is one MCP tool call. The agent reads the response from each call and decides the next action. No Playwright scripts, no browser process management, no HTML parsing.
Multi-Step Research Workflows
Research workflows are where MCP-powered browser automation really shines. Your AI assistant can gather information across multiple pages and synthesize the results.
A typical research workflow:
- Create a session and navigate to a search engine or directory
- Extract a list of relevant URLs
- Visit each URL, observe the page content, and extract specific data
- Close the session when done
The key insight: the MCP server handles session state. The agent calls browserbeam_navigate to move between pages within the same session. Cookies, authentication state, and browsing history persist. The agent doesn't need to manage browser lifecycle. It just calls tools and reads responses.
For research across multiple sites, create one session per site and switch between them using the session IDs. The Browserbeam API supports concurrent sessions, and the MCP server tracks each one independently.
Parallel Browser Sessions from a Single Agent
When your agent needs to compare data from multiple sources, parallel sessions save time. Instead of visiting three competitor sites sequentially, create three sessions and work on them concurrently.
# Create parallel sessions
# Session 1: browserbeam_create_session(url="https://competitor-a.com/pricing")
# Session 2: browserbeam_create_session(url="https://competitor-b.com/pricing")
# Session 3: browserbeam_create_session(url="https://competitor-c.com/pricing")
# Extract pricing from each
# browserbeam_extract(session_id=sid1, schema='{"plans": [...]}')
# browserbeam_extract(session_id=sid2, schema='{"plans": [...]}')
# browserbeam_extract(session_id=sid3, schema='{"plans": [...]}')
# Close all sessions
# browserbeam_close(session_id=sid1)
# browserbeam_close(session_id=sid2)
# browserbeam_close(session_id=sid3)
Each session runs an isolated browser instance on Browserbeam's infrastructure. No resource contention, no shared state, no browser crashes affecting other sessions. The MCP server multiplexes tool calls across all active sessions.
Real-World MCP Use Cases
These three use cases demonstrate what becomes possible when your AI coding assistant has browser access through MCP. Each includes the MCP tool sequence so you can try them yourself.
AI-Assisted QA Testing
Your agent can test deployed web applications without a separate QA framework. Ask it to verify a user flow, and it will use Browserbeam MCP tools to walk through the steps.
Prompt: "Go to our staging site at https://staging.example.com, create a new account with test data, verify the confirmation email link works, and report any UI issues."
The agent's MCP tool sequence:
browserbeam_create_sessionwith the staging URLbrowserbeam_clickon the "Sign Up" buttonbrowserbeam_fillwith test data (name, email, password)browserbeam_screenshotto capture the confirmation pagebrowserbeam_navigateto the email confirmation URLbrowserbeam_observeto verify the landing pagebrowserbeam_closeto clean up
The agent reports what it found in natural language: broken links, missing form validation, layout issues. No Cypress tests, no Selenium scripts, no test framework setup. You describe the test case, and the agent executes it.
For teams running browser automation at scale, this pattern lets you add exploratory testing to your CI pipeline without maintaining a separate test suite.
Automated Documentation Generation
Need to document a third-party API or a competitor's feature set? Your agent can browse their docs, extract the relevant information, and generate structured documentation.
Prompt: "Go to the Stripe API docs, extract all webhook event types with their descriptions, and format them as a markdown table."
The agent navigates to the docs site, uses browserbeam_scroll_collect to load all lazy content, then calls browserbeam_extract with a schema that pulls event names and descriptions. The result is a clean markdown table, ready to paste into your own docs.
This pattern also works for internal documentation. Point the agent at your staging environment, ask it to document every page and form in the app, and it will crawl through the UI capturing structured page content.
Competitive Research Agent
Market analysis used to mean manually visiting competitor sites, taking screenshots, and copying pricing data into spreadsheets. With MCP browser tools, your AI assistant does this in minutes.
Prompt: "Compare the pricing pages of Browserbase, Steel, and Playwright Cloud. Extract plan names, prices, included features, and rate limits. Present as a comparison table."
The agent creates parallel sessions, visits each pricing page, extracts structured data, and synthesizes the results into a comparison table. It can even visit changelogs or announcement pages to check for recent pricing changes.
This works because browserbeam_extract returns typed JSON, not screenshots. The agent gets data it can reason about, compare, and format. No OCR, no screenshot parsing, no manual data entry.
Observability and Debugging with Browserbeam + MCP
When an MCP tool call fails or returns unexpected results, you need visibility into what happened. Browserbeam provides several debugging tools that work directly through MCP.
Session inspection: Call browserbeam_get_session to check a session's status, URL, viewport size, and elapsed time. If a session expired due to timeout, this tells you immediately.
List active sessions: Use browserbeam_list_sessions with status: "active" to see all running sessions. If your agent forgot to close a session (a common mistake), you'll spot it here.
Screenshots for visual debugging: When the agent reports that a page looks wrong, call browserbeam_screenshot to capture exactly what the browser shows. The screenshot comes back as a base64-encoded image through the MCP response.
Page map for structural debugging: The first browserbeam_observe call in any session automatically includes a page map, a lightweight outline of page sections (nav, header, main, aside, footer) with CSS selectors and content hints. If the agent can't find an element, the page map reveals where it actually lives.
JavaScript execution: For advanced debugging, use browserbeam_execute_js to run diagnostic code in the browser context. Check document.readyState, inspect network requests, or validate DOM state.
Pro Tip: If an MCP tool call returns truncated content (you'll see a notice like "showing 12,000 of 45,000 chars"), call browserbeam_observe with a higher max_text_length or use browserbeam_scroll_collect which defaults to 100,000 characters.
Common Integration Mistakes
Five integration mistakes show up repeatedly when developers connect Browserbeam to their AI coding assistants via MCP. Each one wastes time or causes silent failures.
Misconfiguring Transport Settings
The most common setup error: putting the wrong path in the config file. Each IDE reads MCP config from a specific location:
| IDE | Config File Path |
|---|---|
| Cursor | ~/.cursor/mcp.json |
| Claude Desktop (macOS) | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Claude Desktop (Windows) | %APPDATA%\Claude\claude_desktop_config.json |
| Windsurf | ~/.codeium/windsurf/mcp_config.json |
If you put the config in the wrong file, the IDE never finds the server. It won't show an error. The tools just won't appear.
Another transport mistake: forgetting the -y flag in the args array. Without it, npx prompts for confirmation and the stdio pipe hangs. Always use ["-y", "@browserbeam/mcp-server"].
Not Handling Session Lifecycle
Open sessions consume resources and keep the billing clock running. If your agent creates a session but never calls browserbeam_close, the session stays active until it times out (default: 5 minutes).
This matters most in iterative workflows where the agent creates multiple sessions. Five abandoned sessions at once means five browsers running in the cloud doing nothing.
The fix: make closing sessions explicit. When you prompt the agent, include "close the session when you're done" as part of the instruction. The Browserbeam MCP server already includes lifecycle reminders in its tool descriptions, but reinforcing it in your prompt helps.
Ignoring Rate Limits
Calling browserbeam_observe ten times in a row without acting on the response wastes API calls and slows your workflow. The agent should read the response from create_session or navigate first, since both already return page content. Only call observe when you need to re-read the page after an action.
Similarly, extracting data with browserbeam_extract after every click is wasteful if you only need the final result. Plan your extraction points and batch them where possible.
Passing Raw HTML Instead of Structured Output
Some developers configure the MCP server and then ask the agent to "get the HTML source code." This defeats the purpose. The browserbeam_observe tool returns structured markdown by default, which is far more token-efficient than raw HTML.
If you need HTML for building extraction selectors, use the format: "html" parameter on a scoped observation. But for reading page content, always use the default markdown format.
| Format | Tokens per Page | Best For |
|---|---|---|
| Markdown (default) | ~1,500-3,000 | Reading content, understanding page structure |
| HTML | ~15,000-25,000 | Building CSS selectors for browserbeam_extract |
Skipping Error Handling in Tool Calls
MCP tool calls can fail. The session might have expired. The element ref might be stale after a page navigation. The extraction schema might not match the page structure.
When the agent ignores errors and keeps calling tools, it enters a loop of failures. The fix: teach the agent (through your prompt) to check tool responses for errors and adjust. If a click fails because the ref is stale, observe the page again to get fresh refs. If extraction returns empty results, check the page content to verify the data exists.
Browserbeam MCP vs Other Browser MCP Servers
Browserbeam isn't the only MCP server that gives AI agents browser access. Playwright MCP and Puppeteer MCP are the main alternatives. The differences come down to what the agent receives and how much work it has to do.
Browserbeam MCP vs Playwright MCP
The Playwright MCP server runs a local Playwright instance and exposes browser actions as MCP tools. It gives the agent control over a real browser, but the output format is different.
Playwright MCP returns accessibility snapshots, which are tree structures of ARIA roles and names. This works well for navigation but falls short for content extraction. The agent sees "role: heading, name: Welcome" but doesn't get the full page text, form structures, or extraction capabilities that Browserbeam provides.
Key differences:
- Browser management: Playwright MCP runs a local browser process. Browserbeam MCP calls a cloud API. No local Chrome to install, update, or crash.
- Output format: Playwright returns accessibility trees. Browserbeam returns structured markdown with element refs, page maps, and change diffs.
- Extraction: Playwright has no built-in extraction. You must parse accessibility trees or run JavaScript. Browserbeam's
browserbeam_extracttakes a declarative JSON schema and returns typed data. - Concurrency: Running multiple local browsers eats memory. Browserbeam sessions run on cloud infrastructure with no local resource impact.
Browserbeam MCP vs Puppeteer MCP
The Puppeteer MCP server is similar to Playwright MCP but uses Puppeteer under the hood. It exposes basic browser actions (navigate, click, type, screenshot) and returns screenshots as the primary output.
The screenshot-based approach means the agent must use vision capabilities to understand the page. That's expensive (image tokens cost more than text tokens) and slower than reading structured markdown.
Feature Comparison Table
| Feature | Browserbeam MCP | Playwright MCP | Puppeteer MCP |
|---|---|---|---|
| Output format | Structured markdown + element refs | Accessibility tree | Screenshots |
| Browser location | Cloud (Browserbeam API) | Local (your machine) | Local (your machine) |
| Extraction | Declarative JSON schema | Manual JS execution | Manual JS execution |
| Page map | Auto-included on first observe | Not available | Not available |
| Change diffs | Built-in (changes between observations) | Not available | Not available |
| Stability signal | stable: true when page is ready |
Manual wait strategies | Manual wait strategies |
| Form handling | browserbeam_fill with label-based targeting |
Click + type sequences | Click + type sequences |
| Concurrent sessions | Cloud-managed, no local resources | Limited by local memory | Limited by local memory |
| Cookie banner handling | auto_dismiss_blockers: true |
Write dismissal logic | Write dismissal logic |
| Token cost per page | ~1,500-3,000 (markdown) | ~3,000-8,000 (a11y tree) | ~2,000-5,000 (image tokens) |
| Setup | npx -y @browserbeam/mcp-server + API key |
npx @anthropic-ai/mcp-playwright |
npx @anthropic-ai/mcp-puppeteer |
For AI coding assistants that need to read and reason about web content, Browserbeam MCP gives the richest structured output. Playwright MCP is a solid choice if you need accessibility testing or prefer running browsers locally. Puppeteer MCP works for visual verification tasks where screenshots are the primary output.
Frequently Asked Questions
What is an MCP server and how does it work?
An MCP server is a process that exposes tools, resources, and prompts to AI clients through the Model Context Protocol. The client (like Cursor or Claude Desktop) spawns the server, discovers its tools via JSON Schema, and lets the AI model call those tools as functions. For browser automation, the Browserbeam MCP server translates tool calls into API requests that control real cloud browsers.
What is the Model Context Protocol?
The Model Context Protocol is an open standard created by Anthropic that defines how AI applications connect to external tools and data sources. It uses JSON-RPC over stdio or SSE transport. MCP replaces custom plugin integrations with a single protocol that works across multiple AI clients.
How does MCP work with Cursor?
Add the Browserbeam MCP server config to ~/.cursor/mcp.json with your API key. Restart Cursor, and the agent gains access to 18 browser automation tools. You can then ask Cursor to browse websites, fill forms, extract data, and take screenshots through natural language prompts.
How do I build an MCP server for browser automation?
You don't need to build one from scratch. Install @browserbeam/mcp-server via npx and configure it with your Browserbeam API key. The server handles all the MCP protocol details (tool registration, JSON-RPC, stdio transport) and maps tool calls to the Browserbeam REST API. If you want to build a custom MCP server, use the @modelcontextprotocol/sdk package for TypeScript or the mcp package for Python.
How is Browserbeam MCP different from Playwright MCP?
Browserbeam MCP returns structured markdown with element refs, page maps, and extraction capabilities. Playwright MCP returns accessibility tree snapshots. Browserbeam runs browsers in the cloud so you don't manage local Chrome instances. Playwright MCP runs a local browser, which uses your machine's memory and requires browser binary management.
Can I use Browserbeam MCP with Claude Desktop?
Yes. Add the server config to ~/Library/Application Support/Claude/claude_desktop_config.json on macOS or %APPDATA%\Claude\claude_desktop_config.json on Windows. The config format is identical to Cursor. Claude Desktop discovers the tools through MCP and makes them available in conversations.
Do I need to manage Playwright or Puppeteer locally?
No. The Browserbeam MCP server calls the Browserbeam cloud API, which runs browsers on remote infrastructure. You don't install, update, or manage any browser binaries locally. The only local dependency is Node.js 18+ for running the MCP server process itself.
How much does Browserbeam MCP cost?
The MCP server itself is free and open-source (@browserbeam/mcp-server on npm). You pay for Browserbeam API usage, which is based on session runtime. New accounts get 1 hour of free runtime with no credit card required. Visit browserbeam.com to sign up.
What to Build Next
You now have a bridge between your AI coding assistant and real browsers. The MCP server handles the protocol. Browserbeam handles the browser. Your agent handles the reasoning.
Start with something concrete. Ask your AI coding assistant to test your staging environment after a deploy. Have it extract API documentation from a vendor's site. Build a competitive research workflow that runs every Monday morning.
The patterns scale. The same observe-act-extract loop that works for a single page works for multi-step workflows across dozens of sites. Parallel sessions let your agent gather data from multiple sources simultaneously. And because everything runs through MCP, switching between Cursor, Claude Desktop, or Windsurf is a config file change, not a code rewrite.
If you're building intelligent web agents, the Python SDK gives you programmatic control for production pipelines. For security best practices, check the guide on securing AI browser agents. And if you're scaling beyond a few concurrent sessions, the infrastructure guide covers queue architecture and capacity planning.
Your AI assistant already reads files, runs commands, and writes code. Now it browses the web. What will you build with that?