
Ever wonder why your scraper works on your laptop but returns empty data in production? Or why the same time.sleep(3) call succeeds on Monday and fails on Friday? The answer is almost always timing. You're guessing when the page is ready instead of knowing.
Most javascript web scraping code is built on fixed delays. You navigate to a URL, sleep for a few seconds, and hope the content has loaded. This works until it doesn't. And when it breaks, the failure is silent: you get partial data, empty fields, or stale content from a previous page state. No error. No warning. Just bad data.
There's a better signal. Instead of asking "has enough time passed?" you can ask "has the page stopped changing?" That question has a concrete, measurable answer: zero in-flight network requests for 300ms and zero DOM mutations for 200ms. When both conditions hold, the page is stable. Content is loaded. Data is ready to extract.
This guide explains how that stability detection works, why it replaces time.sleep() in every javascript scraping python workflow, and how to use it for React sites, infinite scroll pages, and AJAX-loaded content. You'll also learn when automatic stability isn't enough and how to add explicit wait conditions for edge cases.
In this guide, you'll learn:
- Why
time.sleep()breaks at scale and the three failure modes of fixed waits - How network idle and DOM mutation signals replace fixed delays
- How Browserbeam's two-signal stability algorithm works under the hood
- How to scrape React, Vue, and Angular sites without timing hacks
- How to handle infinite scroll with a single API call
- When to use
wait_for,wait_until, andpage.stablefor different scenarios - How Browserbeam compares to Selenium and ScrapingBee for web scraping dynamic content
TL;DR: Replace time.sleep() with automatic stability detection. Browserbeam watches two signals before returning page data: no in-flight fetch/XHR requests for 300ms (network idle) and no DOM mutations for 200ms (DOM quiet). This runs automatically on every create, goto, and click call. For edge cases, use wait_for (CSS selector), wait_until (JavaScript expression), or session.wait() to add explicit conditions. No sleep calls. No timing guesses.
The Sleep() Problem in Browser Scraping
What Happens When You time.sleep()
Here's what most headless browser python scripts look like:
import time
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://books.toscrape.com")
time.sleep(3) # Hope for the best
soup = BeautifulSoup(driver.page_source, "html.parser")
titles = [h.text for h in soup.select("h3 a")]
print(f"Found {len(titles)} books")
driver.quit()
The time.sleep(3) is a guess. You're betting that 3 seconds is long enough for the browser to navigate, render the HTML, execute JavaScript, fetch API data, and update the DOM. Sometimes it is. Sometimes it isn't. You have no way to know which case you're in until you look at the data.
The 3 Failure Modes of Fixed Waits
Fixed delays fail in three distinct ways, and each one corrupts your data differently:
| Failure Mode | What Happens | When It Occurs |
|---|---|---|
| Too short | Page hasn't finished rendering. You extract partial or empty data. | Network congestion, slow API response, heavy JavaScript execution |
| Too long | Page loaded in 500ms but you waited 5 seconds. Wasted time compounds across thousands of pages. | Over-conservative sleep values set after a previous failure |
| Variable | Sometimes 2 seconds is enough, sometimes 8 isn't. The same page loads at different speeds depending on server load, CDN cache state, and network conditions. | Any production scraping at scale |
The worst part: "too short" failures are silent. Your script runs, your data looks plausible, and you don't discover the corruption until your downstream pipeline produces bad results.
Why Fixed Waits Break at Scale
Network Variance Kills Static Timeouts
A page that loads in 800ms on your development machine might take 4 seconds from a datacenter in a different region. The same page might load in 600ms at 2am and 3 seconds at peak traffic. The time.sleep() value that works during development becomes the wrong value in production.
When you're scraping 100 pages, this is annoying. When you're scraping 10,000 pages, it's a math problem. Set sleep too high and you waste hours. Set it too low and you get silent data corruption across thousands of records.
The Cost of Over-Sleeping
Over-sleeping is the "safe" option that teams choose after a round of failures. But the cost is real:
A 5-second sleep on 10,000 pages is 13.8 hours of pure waiting. If the average actual load time is 1.2 seconds, you're wasting 10.6 hours. That's compute cost, proxy bandwidth, and elapsed time that scales linearly with your dataset size.
The fix isn't finding the "right" sleep value. The fix is replacing the guess with a signal. For a broader look at scaling patterns, see the scaling web automation guide.
Three Ways Pages Signal "I'm Ready"
Every dynamically-loaded page goes through a predictable lifecycle. The signals that indicate "content is ready" fall into three categories.
Network Idle
When a browser loads a page, it fires dozens of HTTP requests: the HTML document, CSS, JavaScript bundles, fonts, images, and API calls. A React app might fetch the HTML shell in 200ms, then fire 5 API calls to load the actual content. The page is "ready" when all those API calls have completed and no new ones are starting.
Network idle means: zero in-flight fetch and XHR requests for a sustained period. A single analytics ping or WebSocket connection shouldn't block this signal, but they often do with naive implementations (more on that below).
DOM Mutation Quiet
JavaScript frameworks don't just fetch data. They use the data to update the DOM. React reconciles a virtual DOM and patches the real one. Vue's reactivity system triggers DOM updates when observed data changes. Angular runs change detection cycles.
DOM mutation quiet means: the MutationObserver API detects no childList, subtree, or attribute changes for a sustained period. When React finishes rendering a product list, the DOM mutations stop. That's the signal.
Combined Stability Detection
Neither signal alone is sufficient. Network idle can fire before the framework processes the response and updates the DOM. DOM quiet can fire during a brief pause between API responses. The reliable signal is both conditions holding simultaneously: zero network activity for 300ms and zero DOM mutations for 200ms.
This combined approach is what Playwright's networkidle strategy tries to approximate, but Playwright watches transport-level events and gets confused by analytics pings, WebSocket heartbeats, and service workers. The more reliable approach watches application-level requests (actual fetch() and XMLHttpRequest calls) instead of raw TCP connections.
How Browserbeam's Stability Detection Works
The Two-Signal Algorithm
Browserbeam runs two checks in parallel after every navigation and interaction:
Network idle check. Tracks in-flight
fetch()andXMLHttpRequestcalls by patching both APIs at page initialization. When the in-flight count drops to zero, a 300ms timer starts. Any new request resets the timer. The check passes when the timer completes.DOM quiet check. Attaches a
MutationObservertodocument.bodywatchingchildList,subtree, andattributes. Each mutation resets a 200ms debounce timer. The check passes when the timer completes with no new mutations.
Both checks run simultaneously via Promise.all. A 5-second timeout caps the total wait. If either check doesn't complete in 5 seconds, the system proceeds anyway and returns the current page state.
The fetch/XHR patching is the key differentiator. Instead of watching transport-level network events (which include analytics beacons, ad pixels, and WebSocket frames), Browserbeam tracks the actual API calls your target page's JavaScript makes. A React component calling fetch("/api/products") increments the counter. A Google Analytics beacon does not. This is why Browserbeam's stability detection works on sites where Playwright's networkidle hangs indefinitely.
Why domcontentloaded Beats networkidle
Browserbeam navigates with domcontentloaded rather than networkidle. This is a deliberate choice.
domcontentloaded fires when the HTML document is parsed and the initial JavaScript starts executing. It returns quickly, typically within 1-2 seconds. The custom stability algorithm then handles the async content loading.
Playwright's networkidle waits until the browser sees fewer than 2 active network connections for 500ms. This hangs on any page with:
- Google Analytics or tracking pixels (constant beacon requests)
- WebSocket connections (chat widgets, real-time updates)
- Long-polling endpoints (notification systems)
- Service worker pre-caching (PWAs)
By separating navigation from stability, Browserbeam avoids the networkidle trap entirely. Navigate fast, then wait on the right signals.
When Stability Runs Automatically
You don't need to call a stability method explicitly. The system runs the two-signal check automatically at specific points:
| API Call | Stability Check? | What Happens |
|---|---|---|
create(url=...) |
Yes | Navigate + stability + auto-observe |
goto(url=...) |
Yes | Navigate + stability on next observe |
click() / fill() |
Yes | Action + stability + auto-observe |
observe() |
Yes | Stability + content capture |
extract() |
Yes | Stability + content capture + extraction |
scroll_collect() |
Per-scroll wait | Height-based stability (different algorithm) |
wait() |
No | Your explicit condition only |
screenshot() / pdf() |
No | Captures current state immediately |
Every call that reads page content runs the stability check first. Every call that changes the page (click, fill, goto) triggers stability before the next content read. Your code never needs to guess when to read.
What page.stable Actually Means
After every observation, session.page.stable tells you whether the stability check passed within the 5-second timeout. If the page was still loading when the timeout hit, stable is false but data is still returned (the best-effort state at timeout).
from browserbeam import Browserbeam
client = Browserbeam()
session = client.sessions.create(url="https://quotes.toscrape.com")
print(f"Page stable: {session.page.stable}") # True
print(f"Title: {session.page.title}") # "Quotes to Scrape"
print(f"Content length: {len(session.page.markdown.content)} chars")
session.close()
In practice, stable is True on the vast majority of pages. It's False primarily on pages with animations that continuously modify the DOM, or on extremely slow-loading SPAs. When you see stable: False, the data is still usable, but you might want to add an explicit wait_for to target the specific content you need.
Tutorial: Scraping a React E-Commerce Site
Most e-commerce sites use React, Vue, or Angular. The product data loads via API calls after the initial page render. This is where time.sleep() fails most often and where stability detection matters most.
Basic: Auto-Stability on Create
The simplest case: pass a URL and get back stable, rendered content. No sleep, no wait config:
curl -X POST https://api.browserbeam.com/v1/sessions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://books.toscrape.com"}'
from browserbeam import Browserbeam
client = Browserbeam()
session = client.sessions.create(url="https://books.toscrape.com")
result = session.extract(
books=[{
"_parent": "article.product_pod",
"title": "h3 a >> text",
"price": ".price_color >> text"
}]
)
for book in result.extraction["books"][:5]:
print(f"{book['title']}: {book['price']}")
session.close()
import Browserbeam from "@browserbeam/sdk";
const client = new Browserbeam();
const session = await client.sessions.create({ url: "https://books.toscrape.com" });
const result = await session.extract({
books: [{
_parent: "article.product_pod",
title: "h3 a >> text",
price: ".price_color >> text"
}]
});
console.log(result.extraction.books.slice(0, 5));
await session.close();
require "browserbeam"
client = Browserbeam::Client.new
session = client.sessions.create(url: "https://books.toscrape.com")
result = session.extract(
books: [{
_parent: "article.product_pod",
title: "h3 a >> text",
price: ".price_color >> text"
}]
)
result.extraction["books"].first(5).each { |b| puts "#{b['title']}: #{b['price']}" }
session.close
That create call navigates to the URL, waits for network idle + DOM quiet, and returns the fully rendered page. The extract call runs against the stable DOM. No sleep. No explicit wait. The stability algorithm handles the timing.
wait_for: Targeting a Specific Element
When you know the exact element that signals "content is ready," use wait_for with a CSS selector. The system waits for that element to appear in the DOM before running stability detection:
curl -X POST https://api.browserbeam.com/v1/sessions/SESSION_ID/act \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"steps": [{
"goto": {
"url": "https://books.toscrape.com",
"wait_for": "article.product_pod",
"wait_timeout": 15000
}
}]
}'
from browserbeam import Browserbeam
client = Browserbeam()
session = client.sessions.create(url="https://quotes.toscrape.com")
# Navigate and wait for a specific element
session.goto(
url="https://books.toscrape.com",
wait_for="article.product_pod",
wait_timeout=15000
)
result = session.extract(
books=[{
"_parent": "article.product_pod",
"title": "h3 a >> text",
"price": ".price_color >> text"
}]
)
print(f"Found {len(result.extraction['books'])} books")
session.close()
const session = await client.sessions.create({ url: "https://quotes.toscrape.com" });
await session.goto({
url: "https://books.toscrape.com",
wait_for: "article.product_pod",
wait_timeout: 15000
});
const result = await session.extract({
books: [{ _parent: "article.product_pod", title: "h3 a >> text", price: ".price_color >> text" }]
});
console.log(`Found ${result.extraction.books.length} books`);
await session.close();
session = client.sessions.create(url: "https://quotes.toscrape.com")
session.goto(
"https://books.toscrape.com",
wait_for: "article.product_pod",
wait_timeout: 15000
)
result = session.extract(
books: [{ _parent: "article.product_pod", title: "h3 a >> text", price: ".price_color >> text" }]
)
puts "Found #{result.extraction['books'].length} books"
session.close
wait_for is the Browserbeam equivalent of Selenium's WebDriverWait + expected_conditions.presence_of_element_located or Playwright's page.waitForSelector. The difference: you set it once on the navigation call instead of writing a separate wait block. This is the pattern to use when you know your target selector in advance. For the selenium wait for element pattern, this is the direct replacement.
wait_until: Custom JavaScript Conditions
For SPAs that don't render predictable selectors, use wait_until with a JavaScript expression. The system evaluates the expression repeatedly until it returns truthy:
from browserbeam import Browserbeam
client = Browserbeam()
# Wait for a custom JS condition at session creation
session = client.sessions.create(
steps=[{
"goto": {
"url": "https://books.toscrape.com",
"wait_until": "document.querySelectorAll('article.product_pod').length >= 10",
"wait_timeout": 20000
}
}]
)
print(f"Page stable: {session.page.stable}")
print(f"Content loaded: {len(session.page.markdown.content)} chars")
session.close()
This is the playwright wait for selector equivalent when you need more complex conditions than a single CSS selector. The JavaScript expression can check any DOM state, window variable, or computed condition. Use it for SPAs that set a flag like window.__DATA_LOADED__ or for pages where you need a minimum number of elements before extracting.
Tutorial: Handling Infinite Scroll Without Timing Hacks
Infinite scroll pages load content dynamically as you scroll. Traditional approaches use a scroll-sleep-check loop: scroll down, sleep, check if new content loaded, repeat. Each sleep is another guess.
scroll_collect: One Call, Full Page
Browserbeam's scroll_collect replaces the entire scroll loop with a single API call:
curl -X POST https://api.browserbeam.com/v1/sessions/SESSION_ID/act \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"steps": [{
"scroll_collect": {
"max_scrolls": 30,
"wait_ms": 1000,
"max_text_length": 100000
}
}]
}'
from browserbeam import Browserbeam
client = Browserbeam()
session = client.sessions.create(url="https://quotes.toscrape.com/scroll")
result = session.scroll_collect(
max_scrolls=30,
wait_ms=1000,
max_text_length=100000
)
print(f"Collected {len(result.page.markdown.content)} characters")
print(f"Scroll position: {result.page.scroll.percent}%")
session.close()
const session = await client.sessions.create({ url: "https://quotes.toscrape.com/scroll" });
const result = await session.scrollCollect({
max_scrolls: 30,
wait_ms: 1000,
max_text_length: 100000
});
console.log(`Collected ${result.page.markdown.content.length} characters`);
console.log(`Scroll position: ${result.page.scroll.percent}%`);
await session.close();
session = client.sessions.create(url: "https://quotes.toscrape.com/scroll")
result = session.scroll_collect(
max_scrolls: 30,
wait_ms: 1000,
max_text_length: 100_000
)
puts "Collected #{result.page.markdown.content.length} characters"
puts "Scroll position: #{result.page.scroll.percent}%"
session.close
One call. No scroll-sleep-check loop. No timing guesses. The method scrolls one viewport at a time, waits wait_ms between scrolls for lazy content to load, and returns the complete page content when it reaches the bottom.
Bottom Detection: How It Knows When to Stop
The scroll_collect algorithm uses height-based stability rather than the network/DOM signals used by the main stability check. After each scroll, it measures document.documentElement.scrollHeight. When it reaches the bottom of the page (scrollY + viewportHeight >= scrollHeight), it checks whether the page height changed since the last scroll.
If the scroll height remains unchanged for 2 consecutive checks, the page has no more content to load. The method stops and captures the final page state. This handles the common infinite scroll pattern where scrolling to the bottom triggers a lazy load of the next batch.
The max_scrolls parameter (default 50) caps the total number of scroll increments regardless of page height. The timeout_ms parameter (default 60 seconds) caps total elapsed time. Both prevent runaway scrolling on pages with truly infinite content.
Extracting Structured Data from Scrolled Content
After scroll_collect loads all the content, use extract to pull structured data from the fully-loaded page:
session = client.sessions.create(url="https://quotes.toscrape.com/scroll")
# Load all content first
session.scroll_collect(max_scrolls=30, wait_ms=1000)
# Then extract structured data from the full page
result = session.extract(
quotes=[{
"_parent": ".quote",
"text": ".text >> text",
"author": ".author >> text",
"tags": ".keywords >> content"
}]
)
print(f"Extracted {len(result.extraction['quotes'])} quotes")
for q in result.extraction["quotes"][:3]:
print(f" {q['author']}: {q['text'][:60]}...")
session.close()
The combination of scroll_collect followed by extract replaces the traditional pattern of scrolling with Puppeteer or Selenium, waiting with sleep, checking scroll height, and then parsing with BeautifulSoup. For more extraction patterns, see the data extraction deep-dive.
Tutorial: Detecting Content Changes After User Actions
Multi-step scraping workflows (click a tab, expand a section, submit a search) need to detect when the page content has actually changed after an action. The changes object in each response tells you exactly what changed.
Click and Observe: The Changes Object
After any action that modifies the page, Browserbeam's auto-observe captures a diff against the previous state:
from browserbeam import Browserbeam
client = Browserbeam()
session = client.sessions.create(url="https://quotes.toscrape.com")
# First observation: no changes (initial state)
print(f"Initial changes: {session.page.changes}") # None
# Click "Next" to load page 2
session.click(text="Next")
# The changes object shows what changed
if session.page.changes:
print(f"Content changed: {session.page.changes.content_changed}")
if session.page.changes.content_delta:
print(f"Delta length: {len(session.page.changes.content_delta)} chars")
session.close()
The changes object contains:
content_changed: boolean, whether the markdown content differs from the previous observationcontent_delta: the diff between old and new content (useful for incremental processing)elements_added: interactive elements that appeared after the actionelements_removed: interactive elements that disappeared
Monitoring AJAX Updates with Diff Tracking
For pages that update content via AJAX (search results, filtered product lists, tabbed interfaces), the changes object lets you confirm the update happened before extracting:
session = client.sessions.create(url="https://quotes.toscrape.com")
# Extract initial quotes
first_page = session.extract(
quotes=[{
"_parent": ".quote",
"text": ".text >> text",
"author": ".author >> text"
}]
)
print(f"Page 1: {len(first_page.extraction['quotes'])} quotes")
# Click "Next" to load page 2
session.click(text="Next")
# Verify content actually changed
if session.page.changes and session.page.changes.content_changed:
second_page = session.extract(
quotes=[{
"_parent": ".quote",
"text": ".text >> text",
"author": ".author >> text"
}]
)
print(f"Page 2: {len(second_page.extraction['quotes'])} quotes")
print(f"New first author: {second_page.extraction['quotes'][0]['author']}")
else:
print("Warning: content did not change after navigation")
session.close()
This pattern prevents the silent data corruption that happens when you extract data before a page update completes. The content_changed flag is a direct, reliable signal. For building full multi-step agent workflows, see the web scraping agent guide.
wait_for vs wait_until vs page.stable: When to Use Each
Three mechanisms control timing in Browserbeam. They solve different problems and can be combined.
Decision Framework
| Scenario | Method | Why |
|---|---|---|
| Standard page load (most sites) | Auto-stability (default) | Network idle + DOM quiet handles 90% of pages automatically |
| SPA with known content selector | wait_for="css selector" |
Guarantees the target element exists before stability runs |
| SPA with custom readiness flag | wait_until="JS expression" |
Checks arbitrary JavaScript conditions (window variables, element counts, computed state) |
| Post-action confirmation | session.page.changes |
Verifies content actually updated after click/fill/goto |
| Explicit pause or poll | session.wait(ms=..., selector=..., text=..., until=...) |
Fine-grained control for edge cases auto-stability doesn't cover |
| Verify stability completed | session.page.stable |
Check if stability resolved within timeout (True/False) |
Side-by-Side Code Comparison
from browserbeam import Browserbeam
client = Browserbeam()
# 1. Auto-stability (default, no config needed)
session = client.sessions.create(url="https://quotes.toscrape.com")
print(f"Auto: {session.page.stable}, {len(session.page.markdown.content)} chars")
session.close()
# 2. wait_for: CSS selector
session = client.sessions.create(url="https://quotes.toscrape.com")
session.goto(url="https://books.toscrape.com", wait_for="article.product_pod")
print(f"wait_for: {session.page.stable}, {session.page.title}")
session.close()
# 3. wait_until: JavaScript expression
session = client.sessions.create(
steps=[{
"goto": {
"url": "https://books.toscrape.com",
"wait_until": "document.querySelectorAll('article').length > 0"
}
}]
)
print(f"wait_until: {session.page.stable}, {session.page.title}")
session.close()
# 4. session.wait(): explicit condition
session = client.sessions.create(url="https://quotes.toscrape.com")
session.wait(text="Albert Einstein", timeout=10000)
print(f"wait(text): found Einstein, {session.page.stable}")
session.close()
Start with auto-stability. Add wait_for when you know the element you're targeting. Use wait_until for custom conditions. Fall back to session.wait() only for edge cases that the other methods don't cover.
Common Mistakes When Scraping Dynamic Sites
Five patterns that break javascript web scraping pipelines. Each one is a timing problem with a signal-based fix.
1. Using time.sleep() Instead of Stability Signals
The most common mistake. You navigate to a page, sleep for N seconds, and extract data. The sleep value is always wrong for some subset of pages.
Fix: Replace time.sleep() with Browserbeam's auto-stability. The create call already includes stability detection. If you need a specific condition, add wait_for or wait_until to your goto call.
# Before: guess-and-hope
# driver.get(url)
# time.sleep(5)
# data = driver.page_source
# After: signal-based
session = client.sessions.create(url="https://books.toscrape.com")
data = session.page.markdown.content # Already stable
session.close()
2. Waiting for networkidle When domcontentloaded Is Enough
Playwright's waitUntil: "networkidle" seems like the right approach, but it hangs on any page with persistent network connections. Analytics scripts, chat widgets, WebSocket heartbeats, and service worker pre-caching all prevent networkidle from ever firing.
Fix: Browserbeam navigates with domcontentloaded and runs its own stability check. The fetch/XHR patching tracks application-level requests, not transport-level connections. Analytics beacons and WebSocket frames don't interfere.
3. Not Checking page.stable After Navigation
Extracting data immediately after a goto without confirming the page stabilized. The goto might have timed out or the page might still be loading async content.
Fix: Check session.page.stable after navigation. If it's False, add a wait_for with the CSS selector of the content you need:
session.goto(url="https://books.toscrape.com")
if not session.page.stable:
session.wait(selector="article.product_pod", timeout=15000)
result = session.extract(
books=[{
"_parent": "article.product_pod",
"title": "h3 a >> text",
"price": ".price_color >> text"
}]
)
session.close()
4. Scraping Before Infinite Scroll Content Loads
Extracting data from an infinite scroll page after loading only the first viewport. You get 10 items when there are 100.
Fix: Use scroll_collect before extract. The scroll method loads all content, then you extract from the complete page:
session = client.sessions.create(url="https://quotes.toscrape.com/scroll")
session.scroll_collect(max_scrolls=30, wait_ms=1000)
result = session.extract(quotes=[{"_parent": ".quote", "text": ".text >> text", "author": ".author >> text"}])
print(f"Got {len(result.extraction['quotes'])} quotes (not just first 10)")
session.close()
5. Ignoring the Changes Object on Multi-Step Flows
Clicking a "Load More" button or switching tabs and immediately extracting data without confirming the content actually changed. You get the old data from the previous state.
Fix: Check session.page.changes.content_changed after any action that should update the page. If False, the action didn't produce the expected update and you should investigate (wrong element clicked, page didn't respond, or the content was already loaded).
When You Still Need Explicit Waits (and How to Use Them)
Auto-stability handles the common case. But some pages have timing requirements that signals alone can't solve. Browserbeam's wait method gives you fine-grained control for these edge cases.
Custom Wait Conditions with session.wait()
The wait method supports four condition types:
| Parameter | Type | What It Does |
|---|---|---|
ms |
integer | Fixed delay in milliseconds (use sparingly) |
selector |
string | Wait for CSS selector to appear in DOM |
text |
string | Wait for text to appear in body.innerText |
until (Python: until_js) |
string | Evaluate JavaScript expression until truthy |
timeout |
integer | Max wait time in ms (default: 10,000) |
# Wait for specific text to appear
curl -X POST https://api.browserbeam.com/v1/sessions/SESSION_ID/act \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"steps": [{"wait": {"text": "Albert Einstein", "timeout": 10000}}]}'
from browserbeam import Browserbeam
client = Browserbeam()
session = client.sessions.create(url="https://quotes.toscrape.com")
# Wait for specific text on the page
session.wait(text="Albert Einstein", timeout=10000)
# Wait for a CSS selector
session.wait(selector=".quote", timeout=5000)
# Wait for a JavaScript condition
session.wait(until_js="document.querySelectorAll('.quote').length >= 5", timeout=15000)
result = session.extract(
quotes=[{
"_parent": ".quote",
"text": ".text >> text",
"author": ".author >> text"
}]
)
print(f"Found {len(result.extraction['quotes'])} quotes")
session.close()
const session = await client.sessions.create({ url: "https://quotes.toscrape.com" });
await session.wait({ text: "Albert Einstein", timeout: 10000 });
await session.wait({ selector: ".quote", timeout: 5000 });
await session.wait({ until: "document.querySelectorAll('.quote').length >= 5", timeout: 15000 });
const result = await session.extract({
quotes: [{ _parent: ".quote", text: ".text >> text", author: ".author >> text" }]
});
console.log(`Found ${result.extraction.quotes.length} quotes`);
await session.close();
session = client.sessions.create(url: "https://quotes.toscrape.com")
session.wait(text: "Albert Einstein", timeout: 10000)
session.wait(selector: ".quote", timeout: 5000)
session.wait(until: "document.querySelectorAll('.quote').length >= 5", timeout: 15000)
result = session.extract(
quotes: [{ _parent: ".quote", text: ".text >> text", author: ".author >> text" }]
)
puts "Found #{result.extraction['quotes'].length} quotes"
session.close
Note: the wait method requires exactly one condition per call (ms, selector, text, or until). If you need multiple conditions, chain separate wait calls.
Combining Explicit Waits with Auto-Stability
Explicit waits and auto-stability work together. A wait call runs your condition first, then the next observe or extract call runs the standard stability check:
session = client.sessions.create(url="https://quotes.toscrape.com")
# Explicit wait for specific content
session.wait(text="Albert Einstein", timeout=10000)
# Extract runs auto-stability before extraction
result = session.extract(
first_quote=".text >> text",
first_author=".author >> text"
)
print(result.extraction)
session.close()
The two mechanisms are complementary. Auto-stability answers "is the page done loading?" while explicit waits answer "does the page have the specific content I need?" Use both when you need certainty. For more on the SDK's full capabilities, see the Python SDK getting started guide.
Comparison: Browserbeam vs Selenium vs ScrapingBee for Dynamic Content
Three tools, three approaches to the timing problem. Here's how they handle web scraping dynamic content.
Feature Comparison
| Feature | Browserbeam | Selenium | ScrapingBee |
|---|---|---|---|
| Stability detection | Automatic (network idle + DOM quiet) | Manual (WebDriverWait + expected_conditions) |
wait parameter (fixed ms) or wait_for (CSS) |
| Default wait strategy | Signal-based (300ms network + 200ms DOM, 5s cap) | None (manual waits required) | Fixed timeout |
| JavaScript rendering | Full browser, cloud-managed | Full browser, locally managed | Full browser, cloud-managed |
| Infinite scroll | scroll_collect (one call) |
Manual scroll loop + sleep | Not built-in |
| Content change detection | page.changes object (auto-diff) |
Manual DOM comparison | Not available |
| Browser management | Cloud (zero local deps) | Local (ChromeDriver, browser binary, version matching) | Cloud |
| Wait for element | wait_for="selector" on goto |
WebDriverWait(driver, N).until(EC.presence_of(selector)) |
wait_for=".selector" |
| Wait for JS condition | wait_until="expression" on goto |
WebDriverWait(driver, N).until(lambda d: d.execute_script("...")) |
Not available |
| Output format | Markdown + structured JSON | Raw HTML (parse yourself) | Raw HTML |
| Anti-bot handling | Built-in stealth + CAPTCHA solving | Manual (undetected-chromedriver, patches) | Built-in |
Migration from Selenium
The Selenium wait pattern translates directly to Browserbeam:
# Selenium: explicit wait for element
# from selenium.webdriver.support.ui import WebDriverWait
# from selenium.webdriver.support import expected_conditions as EC
# from selenium.webdriver.common.by import By
#
# driver = webdriver.Chrome()
# driver.get("https://books.toscrape.com")
# WebDriverWait(driver, 10).until(
# EC.presence_of_element_located((By.CSS_SELECTOR, "article.product_pod"))
# )
# html = driver.page_source
# driver.quit()
# Browserbeam: same logic, one line
from browserbeam import Browserbeam
client = Browserbeam()
session = client.sessions.create(url="https://quotes.toscrape.com")
session.goto(url="https://books.toscrape.com", wait_for="article.product_pod", wait_timeout=10000)
# Already stable, already clean markdown
print(session.page.markdown.content[:200])
session.close()
No ChromeDriver. No browser binary version matching. No expected_conditions imports. No By.CSS_SELECTOR. The wait_for parameter on goto replaces the entire WebDriverWait pattern. For a full Selenium vs Playwright vs Browserbeam breakdown, see the comparison guide.
Migration from ScrapingBee
ScrapingBee offers a wait parameter (milliseconds) and wait_for (CSS selector). The wait is a fixed delay (the same sleep problem). The wait_for is closer to Browserbeam's approach but returns raw HTML:
# ScrapingBee: fixed wait + raw HTML
# response = requests.get(
# "https://app.scrapingbee.com/api/v1",
# params={
# "api_key": "...",
# "url": "https://books.toscrape.com",
# "render_js": "true",
# "wait": 5000,
# "wait_for": ".product_pod"
# }
# )
# html = response.text # Still need BeautifulSoup
# Browserbeam: signal-based stability + structured output
from browserbeam import Browserbeam
client = Browserbeam()
session = client.sessions.create(url="https://books.toscrape.com")
# Clean markdown, no parsing needed
markdown = session.page.markdown.content
# Or structured JSON, no BeautifulSoup
result = session.extract(
books=[{
"_parent": "article.product_pod",
"title": "h3 a >> text",
"price": ".price_color >> text"
}]
)
session.close()
The key difference: ScrapingBee's wait: 5000 is a fixed 5-second delay (the exact problem this post addresses). Browserbeam's stability detection replaces it with signal-based waiting that adapts to actual page load time. For more on structured extraction schemas, see the structured web scraping guide.
Frequently Asked Questions
How do I scrape a JavaScript-rendered website with Python?
Use a headless browser python library or a cloud browser API. Browserbeam gives you a cloud browser through a Python SDK: pip install browserbeam, create a session with a URL, and the response includes the fully rendered page as clean markdown. JavaScript executes, SPAs render, and the SDK waits for page stability before returning data. No local browser binary needed.
What is a good Playwright alternative for web scraping?
Browserbeam is a playwright alternative that eliminates browser management. Instead of installing Playwright, downloading browser binaries, and writing wait logic, you call an API that returns rendered page content. The main advantages: automatic stability detection (replaces waitForSelector and networkidle), built-in anti-bot handling, and structured output (markdown + JSON instead of raw HTML). For headless browser scraping without infrastructure, it's a direct replacement.
How does Browserbeam handle the Selenium wait for element pattern?
The wait_for parameter on goto replaces Selenium's WebDriverWait + expected_conditions. Instead of WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".product"))), use session.goto(url="...", wait_for=".product", wait_timeout=10000). For JavaScript conditions, wait_until replaces execute_script inside a lambda. Both integrate with auto-stability, so you don't need separate observation logic after the wait completes.
Can I scrape React and Angular sites without a headless browser?
Not reliably. React and Angular render content via JavaScript after the initial page load. Static HTTP requests return empty shells. You need either a local headless browser (Selenium, Playwright, Puppeteer) or a cloud browser API (Browserbeam, ScrapingBee). Browserbeam's advantage for scraping single page applications: the stability detection waits for React's DOM reconciliation and Angular's change detection to finish before returning data. No manual wait logic needed.
How do I scrape infinite scroll pages without sleep() calls?
Use session.scroll_collect(max_scrolls=30, wait_ms=1000). This scrolls one viewport at a time, waits for lazy-loaded content between scrolls, and stops when the page height stabilizes (unchanged for 2 consecutive checks). The result is a single observation with the complete page content. Then use session.extract() to pull structured data from the fully-loaded page. One call replaces the manual scroll-sleep-check loop.
What is DOM mutation detection and why does it matter for scraping?
DOM mutation detection uses the browser's MutationObserver API to watch for changes to the page structure (elements added, removed, or modified). When a JavaScript framework like React processes an API response and updates the DOM, mutations fire. When those mutations stop for 200ms, the framework has finished rendering. This signal tells you the page content is ready to extract. Without it, you're guessing with time.sleep(). Browserbeam runs this check automatically on every observation.
How does Browserbeam compare to ScrapingBee for dynamic content?
ScrapingBee offers a wait parameter (fixed milliseconds) and wait_for (CSS selector). The wait is a fixed delay, the same timing problem as time.sleep(). Browserbeam uses signal-based stability detection (network idle + DOM quiet) that adapts to actual page load time. Browserbeam also returns structured markdown and JSON directly, while ScrapingBee returns raw HTML that needs parsing. For ai agent web scraping and multi-step workflows, Browserbeam supports session reuse and interaction (click, fill, scroll), which ScrapingBee doesn't offer. See the LLM training data pipeline guide for an example of structured data collection at scale.
Start Scraping Without Sleep
The stability algorithm handles the timing complexity so your code doesn't have to. Two signals (network idle for 300ms, DOM quiet for 200ms), a 5-second cap, and automatic triggering on every navigation and interaction. No more time.sleep(). No more WebDriverWait. No more guessing.
Start with auto-stability on create. Add wait_for when you know your target selector. Use wait_until for custom JavaScript conditions. Check page.changes after multi-step workflows. Fall back to session.wait() for the edge cases that none of the above cover.
The pattern is the same regardless of the framework your target site uses. React, Vue, Angular, Next.js, or server-rendered HTML with AJAX updates: the stability algorithm watches the actual signals (fetch/XHR calls and DOM mutations) rather than guessing based on elapsed time.
Grab the SDK and try the first example:
pip install browserbeam # Python
npm install @browserbeam/sdk # TypeScript
gem install browserbeam # Ruby
Sign up for a free account and replace your first time.sleep() with a single create call. The stability detection handles the rest. For the full API reference, see the docs.