MCP Meets Browserbeam: Browser Automation for AI Coding Assistants

Your AI coding assistant can open a browser, navigate to a page, click buttons, fill forms, and extract structured data. Not through a custom plugin or a brittle shell command, but through the Model Context Protocol. One JSON config, one MCP server, and your agent in Cursor, Claude Desktop, or Windsurf gets full browser access.

The Model Context Protocol (MCP) gives AI coding assistants a standard way to call external tools. Browserbeam's MCP server turns those tool calls into real browser sessions, returning structured markdown instead of raw HTML. The result: your agent reasons about clean page content instead of burning tokens on DOM noise.

This guide covers the full stack. You'll set up the Browserbeam MCP server, connect it to your IDE, run your first browser task, and build production-ready agent patterns. By the end, your AI assistant will browse the web as naturally as it reads files.

In this guide, you'll learn:

What the Model Context Protocol is and how MCP servers work
How Browserbeam's MCP server translates tool calls into browser sessions
How to configure the MCP server for Cursor, Claude Desktop, and Windsurf
Agent patterns for observe-act-extract workflows via MCP tools
Real-world use cases: QA testing, documentation generation, competitive research
Common integration mistakes and how to avoid them
How Browserbeam MCP compares to Playwright MCP and Puppeteer MCP

TL;DR: The Model Context Protocol lets AI coding assistants call external tools through a standard interface. Browserbeam's MCP server (@browserbeam/mcp-server) gives your agent 18 browser automation tools that return structured markdown, element refs, and extraction results instead of raw HTML. Set it up in Cursor, Claude Desktop, or Windsurf with a single JSON config.

Understanding the Model Context Protocol (MCP)

Before connecting your AI assistant to a browser, you need to understand what sits between them. The Model Context Protocol is that layer, and getting it right determines whether your agent can actually use browser tools or just knows they exist.

What Is an MCP Server and Why It Exists

An MCP server is a lightweight process that exposes tools, resources, and prompts to AI clients through a standardized protocol. Think of it as a USB port for AI assistants. Instead of each IDE building custom integrations for every external service, MCP provides one interface that works everywhere.

The problem MCP solves is fragmentation. Before MCP, connecting an AI coding assistant to an external tool meant writing a custom plugin for each IDE. A Cursor extension, a Claude Desktop integration, a Windsurf adapter. Three implementations of the same thing. MCP replaces all of that with one server process and one config format.

Concept	What It Does
MCP Server	Process that exposes tools to AI clients via a standard protocol
MCP Client	The AI application (Cursor, Claude Desktop, Windsurf) that connects to servers
Tools	Functions the AI can call (e.g., `browserbeam_navigate`, `browserbeam_extract`)
Resources	Read-only data the server provides (files, database records, API responses)
Prompts	Pre-built prompt templates the server suggests to the client

MCP vs Direct API Calls

You might wonder why your agent needs MCP at all. Why not just call the Browserbeam API directly?

Direct API calls work, but they require the AI model to know the full API surface: endpoints, authentication headers, request bodies, error codes. The model must generate valid HTTP requests and parse JSON responses. That's a lot of context window spent on plumbing.

MCP flips this. The server advertises its tools with typed schemas. The AI client discovers what's available and presents those tools to the model as callable functions. The model calls browserbeam_observe with a session ID, and the MCP infrastructure handles the HTTP request, authentication, and response formatting. Your agent never constructs a curl command or parses a raw JSON response.

The practical difference: fewer tokens spent on boilerplate, fewer errors from malformed requests, and a consistent interface across every AI client that supports MCP.

The MCP Ecosystem in 2026

MCP adoption grew fast. Anthropic open-sourced the protocol specification, and the major AI coding assistants added support within months. As of 2026, MCP clients include:

Cursor (via ~/.cursor/mcp.json)
Claude Desktop (via claude_desktop_config.json)
Windsurf (via mcp_config.json)
VS Code with compatible extensions
Custom agents built with the MCP SDK

The server ecosystem covers databases, file systems, APIs, and now browser automation. Browserbeam's MCP server is one of over 100 community and vendor-built MCP servers listed in the official registry.

How MCP Architecture Works

Understanding the architecture helps you debug connection issues and build better agent workflows. The protocol has three layers: transport, discovery, and execution.

Transport Layer: stdio vs SSE

MCP supports two transport mechanisms. The Browserbeam MCP server uses stdio (standard input/output), which is the simpler and more common option.

Transport	How It Works	When to Use
stdio	Client spawns the server process and communicates via stdin/stdout	Local IDE integrations (Cursor, Claude Desktop, Windsurf)
SSE	Client connects to a remote HTTP endpoint with Server-Sent Events	Remote/cloud MCP servers, shared team setups

With stdio, the IDE spawns npx -y @browserbeam/mcp-server as a child process. Messages flow through stdin and stdout as JSON-RPC. No network configuration, no firewall rules, no port management. The server starts when you open your IDE and stops when you close it.

SSE transport connects to a remote HTTP endpoint. This is useful for shared team servers or cloud-hosted MCP setups, but adds network complexity. For browser automation through Browserbeam, stdio is the right choice because the MCP server is just a thin translation layer. The actual browser runs on Browserbeam's cloud infrastructure regardless of which transport you pick.

Tool Discovery and Schema

When an MCP client connects to a server, the first thing it does is ask: "What tools do you have?" The server responds with a list of tool names, descriptions, and JSON Schema definitions for each tool's parameters.

This is what the Browserbeam MCP server returns for the browserbeam_observe tool:

{
  "name": "browserbeam_observe",
  "description": "Re-read the current page state...",
  "inputSchema": {
    "type": "object",
    "properties": {
      "session_id": { "type": "string" },
      "scope": { "type": "string" },
      "format": { "type": "string", "enum": ["markdown", "html"] },
      "mode": { "type": "string", "enum": ["main", "full"] },
      "include_page_map": { "type": "boolean" },
      "max_text_length": { "type": "number" }
    },
    "required": ["session_id"]
  }
}

The AI client reads this schema and knows exactly how to call the tool. The model sees browserbeam_observe as a function it can invoke, just like reading a file or running a terminal command. No prompt engineering needed to teach the model about API endpoints.

Resource and Prompt Primitives

Beyond tools, MCP defines two other primitives:

Resources are read-only data the server exposes. A database MCP server might expose table schemas as resources. A file system server exposes directory listings. The Browserbeam MCP server focuses on tools rather than resources, because browser interactions are inherently action-oriented.

Prompts are pre-built prompt templates the server suggests. A code review MCP server might offer a "review this PR" prompt template. These are optional and most browser automation workflows don't need them since the AI model already knows how to compose browser tasks from tool descriptions.

Browserbeam as an MCP Backend for Browser Tasks

The Browserbeam MCP server bridges MCP tool calls to the Browserbeam API. Your AI coding assistant calls MCP tools. The server translates those calls into HTTPS requests. Real browsers run on Browserbeam's infrastructure. Structured results flow back through the MCP protocol.

AI Agent
(Cursor / Claude / Windsurf)

→

MCP Tool Call
browserbeam_observe

→

Browserbeam API
POST /v1/sessions/:id/act

→

Real Browser
Chrome on cloud infra

Structured Response
Markdown + refs + diffs

←

MCP Result
text content

←

Browserbeam API
JSON response

←

Page State
DOM + network idle

What the Browserbeam MCP Server Exposes

The @browserbeam/mcp-server package (v0.4.0) exposes 18 MCP tools. Each tool maps to a Browserbeam API operation:

Tool	What It Does
`browserbeam_create_session`	Create a browser session, optionally navigate to a URL
`browserbeam_navigate`	Navigate to a new URL in an existing session
`browserbeam_observe`	Get page content as markdown with interactive element refs
`browserbeam_click`	Click an element by ref, text, or label
`browserbeam_fill`	Fill form fields or an entire form at once
`browserbeam_type`	Type text character-by-character with real keyboard events
`browserbeam_select`	Select an option from a dropdown
`browserbeam_check`	Check or uncheck a checkbox or radio button
`browserbeam_scroll`	Scroll the page or scroll an element into view
`browserbeam_scroll_collect`	Scroll to load lazy content, then return the full page
`browserbeam_wait`	Wait for a selector, text, JS expression, or fixed delay
`browserbeam_extract`	Extract structured data using a declarative JSON schema
`browserbeam_execute_js`	Run custom JavaScript in the browser page context
`browserbeam_screenshot`	Take a screenshot of the current page
`browserbeam_pdf`	Generate a PDF of the current page
`browserbeam_upload`	Upload files to a file input element
`browserbeam_list_sessions`	List your active browser sessions
`browserbeam_get_session`	Get the status and metadata of a session
`browserbeam_close`	Close a session and release resources

That's a full browser automation toolkit available as MCP tools. Your AI assistant can do anything a developer does in a browser, without writing Puppeteer scripts or managing Chrome processes.

Why Browser Tasks Need an MCP Server

Browser automation is the missing piece for AI coding assistants. Your agent can read files, run terminal commands, and search codebases. But when it needs to check a deployed app, test a form submission, scrape reference data, or verify a UI change, it has no eyes.

An MCP server for browser automation solves this by treating browser actions as first-class tools. The agent doesn't need to know about HTTP endpoints, authentication tokens, or response parsing. It calls browserbeam_create_session, gets back structured markdown, and makes decisions.

The alternative is running a local Playwright or Puppeteer instance through shell commands. That means managing browser binaries, handling crashes, and parsing raw HTML output. The MCP approach is cleaner: the complexity lives in the server, and the agent gets structured responses.

The Tool-to-API Translation Layer

The Browserbeam MCP server is a thin Node.js process. It receives JSON-RPC messages over stdio, maps them to Browserbeam API endpoints, and formats the responses for the AI client.

Here's what happens when the agent calls browserbeam_click:

The MCP client sends a JSON-RPC tools/call message with name: "browserbeam_click" and arguments { session_id: "ses_abc123", ref: "e5" }
The MCP server sends POST /v1/sessions/ses_abc123/act with body { steps: [{ action: "click", ref: "e5" }] } to the Browserbeam API
Browserbeam executes the click in a real browser and returns the updated page state
The MCP server formats the response as text content (markdown, element refs, changes) and sends it back through stdio

The agent never sees HTTP. It calls tools and reads results.

Setting Up the Browserbeam MCP Server

Getting the MCP server running takes under two minutes. You need a Browserbeam API key and one JSON config file.

Configuring the MCP Server

The Browserbeam MCP server runs through npx, so there's nothing to install globally. Your IDE spawns it automatically when it reads the config.

First, sign up for a free Browserbeam account to get your API key. Then add the server config to your IDE.

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "browserbeam": {
      "command": "npx",
      "args": ["-y", "@browserbeam/mcp-server"],
      "env": {
        "BROWSERBEAM_API_KEY": "your_api_key_here"
      }
    }
  }
}

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "browserbeam": {
      "command": "npx",
      "args": ["-y", "@browserbeam/mcp-server"],
      "env": {
        "BROWSERBEAM_API_KEY": "your_api_key_here"
      }
    }
  }
}

Windsurf (~/.codeium/windsurf/mcp_config.json):

{
  "mcpServers": {
    "browserbeam": {
      "command": "npx",
      "args": ["-y", "@browserbeam/mcp-server"],
      "env": {
        "BROWSERBEAM_API_KEY": "your_api_key_here"
      }
    }
  }
}

The config is identical across all three clients. The command and args tell the IDE to spawn the MCP server via npx. The -y flag auto-confirms the package install. The env block passes your API key to the server process.

Connecting to Cursor, Claude Desktop, or Windsurf

After saving the config file, restart your IDE. The MCP client detects the new server and spawns the process automatically.

In Cursor, you'll see browserbeam listed in the MCP servers panel. The agent gains access to all 18 browser tools immediately. Ask your Cursor agent to "open https://books.toscrape.com and tell me what's on the page," and it will call browserbeam_create_session behind the scenes.

In Claude Desktop, the tools appear in the tool list when you start a new conversation. Claude can use them just like its built-in computer use tools, but with structured markdown output instead of screenshots.

In Windsurf, the integration works the same way. The agent discovers the tools through MCP's standard discovery protocol and presents them as available actions.

If the server fails to start, check these common causes:

Node.js not installed: The server requires Node.js 18 or later
Missing API key: The server exits immediately if BROWSERBEAM_API_KEY is not set
npx not in PATH: Make sure npx is available in the shell your IDE uses

Running Your First Browser Task via MCP

With the server connected, ask your AI assistant to perform a browser task. Here's a simple example you can try right now:

"Go to https://news.ycombinator.com and extract the top 5 story titles with their URLs."

Behind the scenes, the agent will:

Call browserbeam_create_session with url: "https://news.ycombinator.com"
Read the structured markdown response to understand the page
Call browserbeam_extract with a schema like {"stories": [{"_parent": ".titleline", "_limit": 5, "title": "a >> text", "url": "a >> href"}]}
Return the extracted data in a clean format

The agent handles all of this autonomously. You describe the goal in natural language. The MCP server and Browserbeam handle the execution.

Agent Patterns with MCP + Browserbeam

Once the MCP server is connected, building effective agent workflows comes down to patterns. These three cover most browser automation use cases.

Observe-Act-Extract via MCP Tools

The core pattern for MCP browser automation is a three-step loop:

Observe: Call browserbeam_observe or browserbeam_create_session to get the current page state
Act: Based on the page content, call browserbeam_click, browserbeam_fill, or browserbeam_navigate
Extract: Call browserbeam_extract to pull structured data, or read the observation response directly

This maps naturally to how AI agents reason. The model reads the page (observe), decides what to do (reason), takes an action (act), and checks the result (observe again).

A multi-step workflow chains these loops. For example, logging into a site and pulling dashboard data:

# The agent executes this sequence via MCP tool calls:
# 1. browserbeam_create_session(url="https://books.toscrape.com")
# 2. browserbeam_observe(session_id=sid)
# 3. browserbeam_wait(session_id=sid, text="Dashboard")
# 4. browserbeam_observe(session_id=sid)
# 5. browserbeam_extract(session_id=sid, schema='{"metrics": [...]}')
# 6. browserbeam_close(session_id=sid)

Each step is one MCP tool call. The agent reads the response from each call and decides the next action. No Playwright scripts, no browser process management, no HTML parsing.

Multi-Step Research Workflows

Research workflows are where MCP-powered browser automation really shines. Your AI assistant can gather information across multiple pages and synthesize the results.

A typical research workflow:

Create a session and navigate to a search engine or directory
Extract a list of relevant URLs
Visit each URL, observe the page content, and extract specific data
Close the session when done

The key insight: the MCP server handles session state. The agent calls browserbeam_navigate to move between pages within the same session. Cookies, authentication state, and browsing history persist. The agent doesn't need to manage browser lifecycle. It just calls tools and reads responses.

For research across multiple sites, create one session per site and switch between them using the session IDs. The Browserbeam API supports concurrent sessions, and the MCP server tracks each one independently.

Parallel Browser Sessions from a Single Agent

When your agent needs to compare data from multiple sources, parallel sessions save time. Instead of visiting three competitor sites sequentially, create three sessions and work on them concurrently.

# Create parallel sessions
# Session 1: browserbeam_create_session(url="https://competitor-a.com/pricing")
# Session 2: browserbeam_create_session(url="https://competitor-b.com/pricing")
# Session 3: browserbeam_create_session(url="https://competitor-c.com/pricing")

# Extract pricing from each
# browserbeam_extract(session_id=sid1, schema='{"plans": [...]}')
# browserbeam_extract(session_id=sid2, schema='{"plans": [...]}')
# browserbeam_extract(session_id=sid3, schema='{"plans": [...]}')

# Close all sessions
# browserbeam_close(session_id=sid1)
# browserbeam_close(session_id=sid2)
# browserbeam_close(session_id=sid3)

Each session runs an isolated browser instance on Browserbeam's infrastructure. No resource contention, no shared state, no browser crashes affecting other sessions. The MCP server multiplexes tool calls across all active sessions.

Real-World MCP Use Cases

These three use cases demonstrate what becomes possible when your AI coding assistant has browser access through MCP. Each includes the MCP tool sequence so you can try them yourself.

AI-Assisted QA Testing

Your agent can test deployed web applications without a separate QA framework. Ask it to verify a user flow, and it will use Browserbeam MCP tools to walk through the steps.

Prompt: "Go to our staging site at https://staging.example.com, create a new account with test data, verify the confirmation email link works, and report any UI issues."

The agent's MCP tool sequence:

browserbeam_create_session with the staging URL
browserbeam_click on the "Sign Up" button
browserbeam_fill with test data (name, email, password)
browserbeam_screenshot to capture the confirmation page
browserbeam_navigate to the email confirmation URL
browserbeam_observe to verify the landing page
browserbeam_close to clean up

The agent reports what it found in natural language: broken links, missing form validation, layout issues. No Cypress tests, no Selenium scripts, no test framework setup. You describe the test case, and the agent executes it.

For teams running browser automation at scale, this pattern lets you add exploratory testing to your CI pipeline without maintaining a separate test suite.

Automated Documentation Generation

Need to document a third-party API or a competitor's feature set? Your agent can browse their docs, extract the relevant information, and generate structured documentation.

Prompt: "Go to the Stripe API docs, extract all webhook event types with their descriptions, and format them as a markdown table."

The agent navigates to the docs site, uses browserbeam_scroll_collect to load all lazy content, then calls browserbeam_extract with a schema that pulls event names and descriptions. The result is a clean markdown table, ready to paste into your own docs.

This pattern also works for internal documentation. Point the agent at your staging environment, ask it to document every page and form in the app, and it will crawl through the UI capturing structured page content.

Competitive Research Agent

Market analysis used to mean manually visiting competitor sites, taking screenshots, and copying pricing data into spreadsheets. With MCP browser tools, your AI assistant does this in minutes.

Prompt: "Compare the pricing pages of Browserbase, Steel, and Playwright Cloud. Extract plan names, prices, included features, and rate limits. Present as a comparison table."

The agent creates parallel sessions, visits each pricing page, extracts structured data, and synthesizes the results into a comparison table. It can even visit changelogs or announcement pages to check for recent pricing changes.

This works because browserbeam_extract returns typed JSON, not screenshots. The agent gets data it can reason about, compare, and format. No OCR, no screenshot parsing, no manual data entry.

Observability and Debugging with Browserbeam + MCP

When an MCP tool call fails or returns unexpected results, you need visibility into what happened. Browserbeam provides several debugging tools that work directly through MCP.

Session inspection: Call browserbeam_get_session to check a session's status, URL, viewport size, and elapsed time. If a session expired due to timeout, this tells you immediately.

List active sessions: Use browserbeam_list_sessions with status: "active" to see all running sessions. If your agent forgot to close a session (a common mistake), you'll spot it here.

Screenshots for visual debugging: When the agent reports that a page looks wrong, call browserbeam_screenshot to capture exactly what the browser shows. The screenshot comes back as a base64-encoded image through the MCP response.

Page map for structural debugging: The first browserbeam_observe call in any session automatically includes a page map, a lightweight outline of page sections (nav, header, main, aside, footer) with CSS selectors and content hints. If the agent can't find an element, the page map reveals where it actually lives.

JavaScript execution: For advanced debugging, use browserbeam_execute_js to run diagnostic code in the browser context. Check document.readyState, inspect network requests, or validate DOM state.

Pro Tip: If an MCP tool call returns truncated content (you'll see a notice like "showing 12,000 of 45,000 chars"), call browserbeam_observe with a higher max_text_length or use browserbeam_scroll_collect which defaults to 100,000 characters.

Common Integration Mistakes

Five integration mistakes show up repeatedly when developers connect Browserbeam to their AI coding assistants via MCP. Each one wastes time or causes silent failures.

Misconfiguring Transport Settings

The most common setup error: putting the wrong path in the config file. Each IDE reads MCP config from a specific location:

IDE	Config File Path
Cursor	`~/.cursor/mcp.json`
Claude Desktop (macOS)	`~/Library/Application Support/Claude/claude_desktop_config.json`
Claude Desktop (Windows)	`%APPDATA%\Claude\claude_desktop_config.json`
Windsurf	`~/.codeium/windsurf/mcp_config.json`

If you put the config in the wrong file, the IDE never finds the server. It won't show an error. The tools just won't appear.

Another transport mistake: forgetting the -y flag in the args array. Without it, npx prompts for confirmation and the stdio pipe hangs. Always use ["-y", "@browserbeam/mcp-server"].

Not Handling Session Lifecycle

Open sessions consume resources and keep the billing clock running. If your agent creates a session but never calls browserbeam_close, the session stays active until it times out (default: 5 minutes).

This matters most in iterative workflows where the agent creates multiple sessions. Five abandoned sessions at once means five browsers running in the cloud doing nothing.

The fix: make closing sessions explicit. When you prompt the agent, include "close the session when you're done" as part of the instruction. The Browserbeam MCP server already includes lifecycle reminders in its tool descriptions, but reinforcing it in your prompt helps.

Ignoring Rate Limits

Calling browserbeam_observe ten times in a row without acting on the response wastes API calls and slows your workflow. The agent should read the response from create_session or navigate first, since both already return page content. Only call observe when you need to re-read the page after an action.

Similarly, extracting data with browserbeam_extract after every click is wasteful if you only need the final result. Plan your extraction points and batch them where possible.

Passing Raw HTML Instead of Structured Output

Some developers configure the MCP server and then ask the agent to "get the HTML source code." This defeats the purpose. The browserbeam_observe tool returns structured markdown by default, which is far more token-efficient than raw HTML.

If you need HTML for building extraction selectors, use the format: "html" parameter on a scoped observation. But for reading page content, always use the default markdown format.

Format	Tokens per Page	Best For
Markdown (default)	~1,500-3,000	Reading content, understanding page structure
HTML	~15,000-25,000	Building CSS selectors for `browserbeam_extract`

Skipping Error Handling in Tool Calls

MCP tool calls can fail. The session might have expired. The element ref might be stale after a page navigation. The extraction schema might not match the page structure.

When the agent ignores errors and keeps calling tools, it enters a loop of failures. The fix: teach the agent (through your prompt) to check tool responses for errors and adjust. If a click fails because the ref is stale, observe the page again to get fresh refs. If extraction returns empty results, check the page content to verify the data exists.

Browserbeam MCP vs Other Browser MCP Servers

Browserbeam isn't the only MCP server that gives AI agents browser access. Playwright MCP and Puppeteer MCP are the main alternatives. The differences come down to what the agent receives and how much work it has to do.

Browserbeam MCP vs Playwright MCP

The Playwright MCP server runs a local Playwright instance and exposes browser actions as MCP tools. It gives the agent control over a real browser, but the output format is different.

Playwright MCP returns accessibility snapshots, which are tree structures of ARIA roles and names. This works well for navigation but falls short for content extraction. The agent sees "role: heading, name: Welcome" but doesn't get the full page text, form structures, or extraction capabilities that Browserbeam provides.

Key differences:

Browser management: Playwright MCP runs a local browser process. Browserbeam MCP calls a cloud API. No local Chrome to install, update, or crash.
Output format: Playwright returns accessibility trees. Browserbeam returns structured markdown with element refs, page maps, and change diffs.
Extraction: Playwright has no built-in extraction. You must parse accessibility trees or run JavaScript. Browserbeam's browserbeam_extract takes a declarative JSON schema and returns typed data.
Concurrency: Running multiple local browsers eats memory. Browserbeam sessions run on cloud infrastructure with no local resource impact.

Browserbeam MCP vs Puppeteer MCP

The Puppeteer MCP server is similar to Playwright MCP but uses Puppeteer under the hood. It exposes basic browser actions (navigate, click, type, screenshot) and returns screenshots as the primary output.

The screenshot-based approach means the agent must use vision capabilities to understand the page. That's expensive (image tokens cost more than text tokens) and slower than reading structured markdown.

Feature Comparison Table

Feature	Browserbeam MCP	Playwright MCP	Puppeteer MCP
Output format	Structured markdown + element refs	Accessibility tree	Screenshots
Browser location	Cloud (Browserbeam API)	Local (your machine)	Local (your machine)
Extraction	Declarative JSON schema	Manual JS execution	Manual JS execution
Page map	Auto-included on first observe	Not available	Not available
Change diffs	Built-in (changes between observations)	Not available	Not available
Stability signal	`stable: true` when page is ready	Manual wait strategies	Manual wait strategies
Form handling	`browserbeam_fill` with label-based targeting	Click + type sequences	Click + type sequences
Concurrent sessions	Cloud-managed, no local resources	Limited by local memory	Limited by local memory
Cookie banner handling	`auto_dismiss_blockers: true`	Write dismissal logic	Write dismissal logic
Token cost per page	~1,500-3,000 (markdown)	~3,000-8,000 (a11y tree)	~2,000-5,000 (image tokens)
Setup	`npx -y @browserbeam/mcp-server` + API key	`npx @anthropic-ai/mcp-playwright`	`npx @anthropic-ai/mcp-puppeteer`

For AI coding assistants that need to read and reason about web content, Browserbeam MCP gives the richest structured output. Playwright MCP is a solid choice if you need accessibility testing or prefer running browsers locally. Puppeteer MCP works for visual verification tasks where screenshots are the primary output.

Frequently Asked Questions

What is an MCP server and how does it work?

An MCP server is a process that exposes tools, resources, and prompts to AI clients through the Model Context Protocol. The client (like Cursor or Claude Desktop) spawns the server, discovers its tools via JSON Schema, and lets the AI model call those tools as functions. For browser automation, the Browserbeam MCP server translates tool calls into API requests that control real cloud browsers.

What is the Model Context Protocol?

The Model Context Protocol is an open standard created by Anthropic that defines how AI applications connect to external tools and data sources. It uses JSON-RPC over stdio or SSE transport. MCP replaces custom plugin integrations with a single protocol that works across multiple AI clients.

How does MCP work with Cursor?

Add the Browserbeam MCP server config to ~/.cursor/mcp.json with your API key. Restart Cursor, and the agent gains access to 18 browser automation tools. You can then ask Cursor to browse websites, fill forms, extract data, and take screenshots through natural language prompts.

How do I build an MCP server for browser automation?

You don't need to build one from scratch. Install @browserbeam/mcp-server via npx and configure it with your Browserbeam API key. The server handles all the MCP protocol details (tool registration, JSON-RPC, stdio transport) and maps tool calls to the Browserbeam REST API. If you want to build a custom MCP server, use the @modelcontextprotocol/sdk package for TypeScript or the mcp package for Python.

How is Browserbeam MCP different from Playwright MCP?

Browserbeam MCP returns structured markdown with element refs, page maps, and extraction capabilities. Playwright MCP returns accessibility tree snapshots. Browserbeam runs browsers in the cloud so you don't manage local Chrome instances. Playwright MCP runs a local browser, which uses your machine's memory and requires browser binary management.

Can I use Browserbeam MCP with Claude Desktop?

Yes. Add the server config to ~/Library/Application Support/Claude/claude_desktop_config.json on macOS or %APPDATA%\Claude\claude_desktop_config.json on Windows. The config format is identical to Cursor. Claude Desktop discovers the tools through MCP and makes them available in conversations.

Do I need to manage Playwright or Puppeteer locally?

No. The Browserbeam MCP server calls the Browserbeam cloud API, which runs browsers on remote infrastructure. You don't install, update, or manage any browser binaries locally. The only local dependency is Node.js 18+ for running the MCP server process itself.

How much does Browserbeam MCP cost?

The MCP server itself is free and open-source (@browserbeam/mcp-server on npm). You pay for Browserbeam API usage, which is based on session runtime. New accounts get 1 hour of free runtime with no credit card required. Visit browserbeam.com to sign up.

What to Build Next

You now have a bridge between your AI coding assistant and real browsers. The MCP server handles the protocol. Browserbeam handles the browser. Your agent handles the reasoning.

Start with something concrete. Ask your AI coding assistant to test your staging environment after a deploy. Have it extract API documentation from a vendor's site. Build a competitive research workflow that runs every Monday morning.

The patterns scale. The same observe-act-extract loop that works for a single page works for multi-step workflows across dozens of sites. Parallel sessions let your agent gather data from multiple sources simultaneously. And because everything runs through MCP, switching between Cursor, Claude Desktop, or Windsurf is a config file change, not a code rewrite.

If you're building intelligent web agents, the Python SDK gives you programmatic control for production pipelines. For security best practices, check the guide on securing AI browser agents. And if you're scaling beyond a few concurrent sessions, the infrastructure guide covers queue architecture and capacity planning.

Your AI assistant already reads files, runs commands, and writes code. Now it browses the web. What will you build with that?