
YouTube has 2.5 billion monthly active users, and the YouTube Data API caps you at 10,000 quota units per day. A single search request costs 100 units. A single video details request costs 1 unit but returns no comments, no transcripts, and no related videos. If you're building a dataset of 50,000 videos with transcripts for NLP training, the official API would take months of careful quota management. Web scraping gets you there in hours.
The catch: YouTube runs one of the most aggressive headless browser detection systems on the web. Datacenter proxies get blocked. Default User-Agent strings trigger a "Please update your browser" wall. And the site renders entirely through JavaScript custom elements, so static HTTP requests return empty shells.
This guide walks through scraping YouTube with Browserbeam's cloud browser API, which handles JavaScript rendering, residential proxies, and cookie consent automatically. We'll cover everything from basic video metadata to full transcript extraction:
- A scraper that returns video metadata (title, views, likes, duration, upload date, channel) from any YouTube video page
- Structured JSON extraction from YouTube's
ytInitialPlayerResponseobject - Channel scraping: subscriber count, handle, and a full video list with views and upload dates
- Transcript extraction using the interactive "Show transcript" panel
- Why YouTube blocks default headless browsers (and the specific fix)
- YouTube Data API vs web scraping: quota math, available data, and when to use each
- CSV and JSON export for building research datasets
TL;DR: Use residential proxies and a custom Chrome User-Agent for YouTube. Datacenter IPs and default headless User-Agents both trigger blocks. Use observe for rich markdown, execute_js for structured JSON data. Transcripts require an interactive flow: expand the description, click "Show transcript," then read the panel. For video metadata via execute_js, block images, fonts, media, stylesheets, and scripts to minimize bandwidth -- the data lives in an inline script that executes regardless. For channel pages and transcripts, keep scripts and stylesheets enabled since YouTube's SPA framework must render the DOM.
Don't have an API key yet? Create a free Browserbeam account - you get 5,000 credits, no credit card required.
Quick Start: Scrape a YouTube Video
Let's start with the simplest case. Create a session on any YouTube video, then call observe to read the page as structured markdown. Two API calls. You get the title, view count, likes, channel name, subscriber count, description, and a list of related videos.
YouTube requires three specific settings:
- Residential proxy (datacenter IPs get blocked)
- Custom User-Agent matching a real Chrome browser
- Resource blocking for images, fonts, and media (saves bandwidth, speeds up loading)
# Step 1: Create session with residential proxy + custom UA
SESSION_ID=$(curl -s -X POST https://api.browserbeam.com/v1/sessions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"proxy": { "kind": "residential", "country": "us" },
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"block_resources": ["image", "font", "media"],
"auto_dismiss_blockers": true
}' | jq -r '.session_id')
# Step 2: Observe + close
curl -s -X POST "https://api.browserbeam.com/v1/sessions/$SESSION_ID/act" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"steps": [{"observe": {}}, {"close": {}}]}' \
| jq '.page.markdown.content'
from browserbeam import Browserbeam
client = Browserbeam(api_key="YOUR_API_KEY")
session = client.sessions.create(
url="https://www.youtube.com/watch?v=dQw4w9WgXcQ",
proxy={"kind": "residential", "country": "us"},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources=["image", "font", "media"],
auto_dismiss_blockers=True,
)
session.observe()
print(session.page.markdown.content)
session.close()
import Browserbeam from "@browserbeam/sdk";
const client = new Browserbeam({ apiKey: "YOUR_API_KEY" });
const session = await client.sessions.create({
url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
proxy: { kind: "residential", country: "us" },
user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources: ["image", "font", "media"],
auto_dismiss_blockers: true,
});
await session.observe();
console.log(session.page.markdown.content);
await session.close();
require "browserbeam"
client = Browserbeam::Client.new(api_key: "YOUR_API_KEY")
session = client.sessions.create(
url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
proxy: { kind: "residential", country: "us" },
user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources: ["image", "font", "media"],
auto_dismiss_blockers: true,
)
session.observe
puts session.page.markdown.content
session.close
The observe response includes the video title, view count, like count, full description, channel info, related videos, and comment count in clean markdown. No HTML parsing required.
What Data Can You Extract from YouTube?
YouTube pages contain more structured data than most websites. Here's what's available from each page type:
| Page Type | Available Fields | Extraction Method |
|---|---|---|
| Video page | Title, views, likes, duration, upload date, description, channel name, subscriber count, tags, comment count, related videos | observe for markdown, execute_js for ytInitialPlayerResponse |
| Channel page | Channel name, handle, subscriber count, video list (title, views, upload date, duration, URL) | observe for video list, execute_js for structured data |
| Search results | Video titles, channels, view counts, durations, URLs | observe for markdown list |
| Playlist | Playlist title, video count, video list with titles and channels | observe for markdown |
Video pages also expose a ytInitialPlayerResponse JavaScript object that contains structured metadata: videoDetails (title, view count, channel, keywords, duration) and microformat (category, publish date, description). This is the most reliable extraction source because it's populated during the initial page load and doesn't depend on YouTube's DOM structure.
Why YouTube Requires a Custom User-Agent
Most scraping guides focus on IP-based blocking. YouTube is different. The primary detection mechanism is browser fingerprinting through the User-Agent string.
During our API validation, we tested four configurations:
| Configuration | Proxy | User-Agent | Result |
|---|---|---|---|
| Default | Datacenter | Default headless | "Please update your browser" |
| Residential only | Residential | Default headless | "Please update your browser" |
| UA only | Datacenter | Chrome 131 | "Please update your browser" |
| Residential + UA | Residential | Chrome 131 | Full page render |
Both conditions are required: residential proxy AND a modern Chrome User-Agent. The detection happens before the page renders, so you can't work around it with JavaScript. The User-Agent string Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 works reliably.
There's one more gotcha with resource blocking. You should block image, font, and media resources to reduce bandwidth and speed up page loads. But do not block stylesheet on channel pages. YouTube's video grid uses CSS-dependent custom elements (ytd-rich-item-renderer), and blocking stylesheets prevents the video cards from rendering.
| Resource Type | Block on Video Pages? | Block on Channel Pages? |
|---|---|---|
image |
Yes | Yes |
font |
Yes | Yes |
media |
Yes | Yes |
stylesheet |
Safe to block | Do NOT block |
Scraping YouTube Video Pages
Video pages are the richest data source on YouTube. The observe endpoint returns the full page as markdown, including the title, view count, likes, description, and related videos. For structured data, execute_js can parse the LD+JSON VideoObject embedded in every video page.
Step 1: Create Session and Observe
The create call already returns page markdown. But for YouTube, we recommend calling observe separately to get a fresh read after any cookie consent dialogs are dismissed.
Step 2: Extract Structured Data via ytInitialPlayerResponse
Every YouTube video page populates a global ytInitialPlayerResponse object with the full video metadata. This gives you clean, structured data that's more reliable than parsing the DOM. The videoDetails object contains the title, view count, channel name, duration, and keywords. The microformat object adds the category, publish date, and full description.
Since ytInitialPlayerResponse is set by an inline <script> tag in the HTML, we can block external scripts and stylesheets too. This drops proxy bandwidth from ~15 MB to ~1.5 MB per request -- YouTube's main JavaScript bundle alone is 9.7 MB.
# Create session
SESSION_ID=$(curl -s -X POST https://api.browserbeam.com/v1/sessions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"proxy": { "kind": "residential", "country": "us" },
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"block_resources": ["image", "font", "media", "stylesheet", "script"],
"auto_dismiss_blockers": true
}' | jq -r '.session_id')
# Extract video metadata from ytInitialPlayerResponse
curl -s -X POST "https://api.browserbeam.com/v1/sessions/$SESSION_ID/act" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"steps": [{
"execute_js": {
"code": "const vd = ytInitialPlayerResponse?.videoDetails; const mf = ytInitialPlayerResponse?.microformat?.playerMicroformatRenderer; return { title: vd?.title, viewCount: vd?.viewCount, channel: vd?.author, channelId: vd?.channelId, lengthSeconds: vd?.lengthSeconds, keywords: vd?.keywords, description: vd?.shortDescription, category: mf?.category, publishDate: mf?.publishDate, thumbnail: mf?.thumbnail?.thumbnails?.[0]?.url };",
"result_key": "video"
}
}, {"close": {}}]
}' | jq '.extraction.video'
from browserbeam import Browserbeam
import json
client = Browserbeam(api_key="YOUR_API_KEY")
session = client.sessions.create(
url="https://www.youtube.com/watch?v=dQw4w9WgXcQ",
proxy={"kind": "residential", "country": "us"},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources=["image", "font", "media", "stylesheet", "script"],
auto_dismiss_blockers=True,
)
js_code = """
const vd = ytInitialPlayerResponse?.videoDetails;
const mf = ytInitialPlayerResponse?.microformat?.playerMicroformatRenderer;
return {
title: vd?.title,
viewCount: vd?.viewCount,
channel: vd?.author,
channelId: vd?.channelId,
lengthSeconds: vd?.lengthSeconds,
keywords: vd?.keywords,
description: vd?.shortDescription,
category: mf?.category,
publishDate: mf?.publishDate,
thumbnail: mf?.thumbnail?.thumbnails?.[0]?.url
};
"""
session.execute_js(js_code, result_key="video")
video = session.extraction["video"]
print(json.dumps(video, indent=2))
session.close()
import Browserbeam from "@browserbeam/sdk";
const client = new Browserbeam({ apiKey: "YOUR_API_KEY" });
const session = await client.sessions.create({
url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
proxy: { kind: "residential", country: "us" },
user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources: ["image", "font", "media", "stylesheet", "script"],
auto_dismiss_blockers: true,
});
const jsCode = `
const vd = ytInitialPlayerResponse?.videoDetails;
const mf = ytInitialPlayerResponse?.microformat?.playerMicroformatRenderer;
return {
title: vd?.title,
viewCount: vd?.viewCount,
channel: vd?.author,
channelId: vd?.channelId,
lengthSeconds: vd?.lengthSeconds,
keywords: vd?.keywords,
description: vd?.shortDescription,
category: mf?.category,
publishDate: mf?.publishDate,
thumbnail: mf?.thumbnail?.thumbnails?.[0]?.url
};
`;
await session.executeJs({ code: jsCode, result_key: "video" });
console.log(JSON.stringify(session.extraction.video, null, 2));
await session.close();
require "browserbeam"
require "json"
client = Browserbeam::Client.new(api_key: "YOUR_API_KEY")
session = client.sessions.create(
url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
proxy: { kind: "residential", country: "us" },
user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources: ["image", "font", "media", "stylesheet", "script"],
auto_dismiss_blockers: true,
)
js_code = <<~JS
const vd = ytInitialPlayerResponse?.videoDetails;
const mf = ytInitialPlayerResponse?.microformat?.playerMicroformatRenderer;
return {
title: vd?.title,
viewCount: vd?.viewCount,
channel: vd?.author,
channelId: vd?.channelId,
lengthSeconds: vd?.lengthSeconds,
keywords: vd?.keywords,
description: vd?.shortDescription,
category: mf?.category,
publishDate: mf?.publishDate,
thumbnail: mf?.thumbnail?.thumbnails?.[0]?.url
};
JS
session.execute_js(js_code, result_key: "video")
puts JSON.pretty_generate(session.extraction["video"])
session.close
The response looks like this:
{
"title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
"viewCount": "1767841059",
"channel": "Rick Astley",
"channelId": "UCuAXFkgsw1L7xaCfnd5JJOw",
"lengthSeconds": "213",
"keywords": ["rick astley", "Never Gonna Give You Up", "nggyu", "never gonna give you up lyrics", "rick rolled"],
"description": "The official video for \"Never Gonna Give You Up\" by Rick Astley...",
"category": "Music",
"publishDate": "2009-10-24T23:57:33-07:00",
"thumbnail": "https://i.ytimg.com/vi/dQw4w9WgXcQ/sddefault.jpg"
}
The lengthSeconds gives you the video duration in seconds (213 = 3 minutes 33 seconds). The keywords array contains the tags the creator set. The category comes from YouTube's content classification system.
Scraping YouTube Channel Pages
Channel pages list all of a creator's videos with titles, view counts, upload dates, and durations. Navigate to /@handle/videos to get the video grid.
Channel pages need a slightly different configuration. Do not block stylesheets, or the video grid won't render.
# Create session on channel videos page
SESSION_ID=$(curl -s -X POST https://api.browserbeam.com/v1/sessions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.youtube.com/@RickAstleyYT/videos",
"proxy": { "kind": "residential", "country": "us" },
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"block_resources": ["image", "font", "media"],
"auto_dismiss_blockers": true
}' | jq -r '.session_id')
# Extract structured channel data via innerText parsing
curl -s -X POST "https://api.browserbeam.com/v1/sessions/$SESSION_ID/act" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"steps": [{
"execute_js": {
"code": "const items = document.querySelectorAll(\"ytd-rich-item-renderer\"); const videos = []; for (const el of items) { const link = el.querySelector(\"a[href*=\\\"/watch\\\"]\"); const titleEl = el.querySelector(\"h3\"); const title = titleEl ? (titleEl.innerText || \"\").trim() : null; const inner = el.innerText || \"\"; const vm = inner.match(/([0-9][0-9.,]*[KMB]?) views/); const tm = inner.match(/(\\d+ (?:seconds?|minutes?|hours?|days?|weeks?|months?|years?) ago)/); const dm = inner.match(/^(\\d+:\\d+)/m); videos.push({ title, url: link ? link.href : null, views: vm ? vm[0] : null, uploaded: tm ? tm[1] : null, duration: dm ? dm[1] : null }); } const header = document.querySelector(\"#page-header\"); const headerText = header?.innerText || \"\"; const nameEl = header?.querySelector(\"yt-dynamic-text-view-model\"); const channelName = nameEl?.innerText?.trim() || \"\"; const hm = headerText.match(/@[\\\\w-]+/); const handle = hm ? hm[0] : \"\"; const sm = headerText.match(/([\\\\d.]+[KMB]?) subscribers/); const subscribers = sm ? sm[1] + \" subscribers\" : \"\"; return { channel: channelName, handle, subscribers, videoCount: videos.length, videos: videos.slice(0, 10) };",
"result_key": "channel"
}
}, {"close": {}}]
}' | jq '.extraction.channel'
from browserbeam import Browserbeam
import json
client = Browserbeam(api_key="YOUR_API_KEY")
session = client.sessions.create(
url="https://www.youtube.com/@RickAstleyYT/videos",
proxy={"kind": "residential", "country": "us"},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources=["image", "font", "media"],
auto_dismiss_blockers=True,
)
js_code = """
const items = document.querySelectorAll('ytd-rich-item-renderer');
const videos = [];
for (const el of items) {
const link = el.querySelector('a[href*="/watch"]');
const titleEl = el.querySelector('h3');
const title = titleEl ? (titleEl.innerText || '').trim() : null;
const inner = el.innerText || '';
const vm = inner.match(/([0-9][0-9.,]*[KMB]?) views/);
const tm = inner.match(/(\\d+ (?:seconds?|minutes?|hours?|days?|weeks?|months?|years?) ago)/);
const dm = inner.match(/^(\\d+:\\d+)/m);
videos.push({
title,
url: link ? link.href : null,
views: vm ? vm[0] : null,
uploaded: tm ? tm[1] : null,
duration: dm ? dm[1] : null
});
}
const header = document.querySelector('#page-header');
const headerText = header?.innerText || '';
const nameEl = header?.querySelector('yt-dynamic-text-view-model');
const channelName = nameEl?.innerText?.trim() || '';
const hm = headerText.match(/@[\\w-]+/);
const handle = hm ? hm[0] : '';
const sm = headerText.match(/([\\d.]+[KMB]?) subscribers/);
const subscribers = sm ? sm[1] + ' subscribers' : '';
return {
channel: channelName,
handle,
subscribers,
videoCount: videos.length,
videos: videos.slice(0, 10)
};
"""
session.execute_js(js_code, result_key="channel")
print(json.dumps(session.extraction["channel"], indent=2))
session.close()
import Browserbeam from "@browserbeam/sdk";
const client = new Browserbeam({ apiKey: "YOUR_API_KEY" });
const session = await client.sessions.create({
url: "https://www.youtube.com/@RickAstleyYT/videos",
proxy: { kind: "residential", country: "us" },
user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources: ["image", "font", "media"],
auto_dismiss_blockers: true,
});
const jsCode = `
const items = document.querySelectorAll('ytd-rich-item-renderer');
const videos = [];
for (const el of items) {
const link = el.querySelector('a[href*="/watch"]');
const titleEl = el.querySelector('h3');
const title = titleEl ? (titleEl.innerText || '').trim() : null;
const inner = el.innerText || '';
const vm = inner.match(/([0-9][0-9.,]*[KMB]?) views/);
const tm = inner.match(/(\\d+ (?:seconds?|minutes?|hours?|days?|weeks?|months?|years?) ago)/);
const dm = inner.match(/^(\\d+:\\d+)/m);
videos.push({
title,
url: link ? link.href : null,
views: vm ? vm[0] : null,
uploaded: tm ? tm[1] : null,
duration: dm ? dm[1] : null
});
}
const header = document.querySelector('#page-header');
const headerText = header?.innerText || '';
const nameEl = header?.querySelector('yt-dynamic-text-view-model');
const channelName = nameEl?.innerText?.trim() || '';
const hm = headerText.match(/@[\\w-]+/);
const handle = hm ? hm[0] : '';
const sm = headerText.match(/([\\d.]+[KMB]?) subscribers/);
const subscribers = sm ? sm[1] + ' subscribers' : '';
return {
channel: channelName,
handle,
subscribers,
videoCount: videos.length,
videos: videos.slice(0, 10)
};
`;
await session.executeJs({ code: jsCode, result_key: "channel" });
console.log(JSON.stringify(session.extraction.channel, null, 2));
await session.close();
require "browserbeam"
require "json"
client = Browserbeam::Client.new(api_key: "YOUR_API_KEY")
session = client.sessions.create(
url: "https://www.youtube.com/@RickAstleyYT/videos",
proxy: { kind: "residential", country: "us" },
user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources: ["image", "font", "media"],
auto_dismiss_blockers: true,
)
js_code = <<~JS
const items = document.querySelectorAll('ytd-rich-item-renderer');
const videos = [];
for (const el of items) {
const link = el.querySelector('a[href*="/watch"]');
const titleEl = el.querySelector('h3');
const title = titleEl ? (titleEl.innerText || '').trim() : null;
const inner = el.innerText || '';
const vm = inner.match(/([0-9][0-9.,]*[KMB]?) views/);
const tm = inner.match(/(\\d+ (?:seconds?|minutes?|hours?|days?|weeks?|months?|years?) ago)/);
const dm = inner.match(/^(\\d+:\\d+)/m);
videos.push({
title,
url: link ? link.href : null,
views: vm ? vm[0] : null,
uploaded: tm ? tm[1] : null,
duration: dm ? dm[1] : null
});
}
const header = document.querySelector('#page-header');
const headerText = header?.innerText || '';
const nameEl = header?.querySelector('yt-dynamic-text-view-model');
const channelName = nameEl?.innerText?.trim() || '';
const hm = headerText.match(/@[\\w-]+/);
const handle = hm ? hm[0] : '';
const sm = headerText.match(/([\\d.]+[KMB]?) subscribers/);
const subscribers = sm ? sm[1] + ' subscribers' : '';
return {
channel: channelName,
handle,
subscribers,
videoCount: videos.length,
videos: videos.slice(0, 10)
};
JS
session.execute_js(js_code, result_key: "channel")
puts JSON.pretty_generate(session.extraction["channel"])
session.close
The innerText parsing approach is necessary because YouTube's custom elements (ytd-rich-item-renderer) don't always populate standard textContent on their child elements. Parsing the raw innerText with regex and extracting views, upload dates, and durations is more reliable than querying individual span elements. For channel metadata (name, handle, subscribers), we parse the #page-header element's text content.
Loading More Videos
YouTube lazy-loads videos as you scroll. The initial page shows roughly 30 videos. To get more, use execute_js to scroll the page, wait for new content, then re-extract:
for i in range(3):
session.execute_js("window.scrollTo(0, document.body.scrollHeight)")
session.wait(ms=2000)
session.execute_js(js_code, result_key="channel")
all_videos = session.extraction["channel"]["videos"]
Each scroll loads approximately 30 more videos. Three scrolls gives you around 120 videos from any channel.
Extracting YouTube Transcripts
Transcript extraction is the highest-value feature for data science and NLP use cases. "YouTube video transcript" gets 22,200 monthly searches, making it one of the most sought-after pieces of YouTube data.
YouTube's transcript is loaded dynamically through an engagement panel. The extraction requires an interactive flow: expand the video description, click the "Show transcript" button, wait for the panel to populate, then read the timestamped segments.
from browserbeam import Browserbeam
import json
client = Browserbeam(api_key="YOUR_API_KEY")
session = client.sessions.create(
url="https://www.youtube.com/watch?v=dQw4w9WgXcQ",
proxy={"kind": "residential", "country": "us"},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources=["image", "font", "media"],
auto_dismiss_blockers=True,
)
# Step 1: Expand the video description
session.execute_js("""
const btn = document.querySelector('ytd-text-inline-expander #expand');
if (btn) { btn.scrollIntoView({behavior: 'instant', block: 'center'}); btn.click(); }
""")
session.wait(ms=1000)
# Step 2: Click "Show transcript"
session.click(text="Show transcript")
session.wait(ms=2000)
# Step 3: Extract transcript segments
session.execute_js("""
const segments = document.querySelectorAll('macro-markers-panel-item-view-model');
const lines = [];
segments.forEach(s => {
const inner = s.innerText?.trim() || '';
const m = inner.match(/^(\\d+:\\d+)/);
if (m) {
const rest = inner.replace(/^\\d+:\\d+\\s*/, '')
.replace(/^[\\d\\s,minutesecondhor]+/, '').trim();
lines.push({ time: m[1], text: rest });
}
});
return { count: lines.length, transcript: lines };
""", result_key="transcript")
transcript = session.extraction["transcript"]
print(f"Found {transcript['count']} segments")
for line in transcript["transcript"][:5]:
print(f"[{line['time']}] {line['text']}")
session.close()
Sample output:
{
"count": 24,
"transcript": [
{ "time": "0:01", "text": "[♪♪♪]" },
{ "time": "0:18", "text": "♪ We're no strangers to love ♪ ♪ You know the rules and so do I ♪" },
{ "time": "0:27", "text": "♪ A full commitment's what I'm thinking of ♪ ♪ You wouldn't get this from any other guy ♪" },
{ "time": "0:35", "text": "♪ I just wanna tell you how I'm feeling ♪ ♪ Gotta make you understand ♪" },
{ "time": "0:43", "text": "♪ Never gonna give you up ♪ ♪ Never gonna let you down ♪" }
]
}
Not all videos have transcripts. Auto-generated captions are available on most English-language videos, but some creators disable them. You can check availability by looking for the ytInitialPlayerResponse.captions.playerCaptionsTracklistRenderer.captionTracks array in the page's JavaScript context. If it's empty or missing, that video has no transcript.
# Check transcript availability without opening the panel
session.execute_js("""
try {
const tracks = ytInitialPlayerResponse
?.captions
?.playerCaptionsTracklistRenderer
?.captionTracks;
if (!tracks || tracks.length === 0) return { available: false };
return {
available: true,
languages: tracks.map(t => ({
code: t.languageCode,
name: t.name?.simpleText,
kind: t.kind || 'manual'
}))
};
} catch(e) { return { available: false, error: e.message }; }
""", result_key="captions")
Saving and Processing Your Data
Once you've scraped video metadata and transcripts, you'll want to save the data for analysis. Here's a Python script that scrapes multiple videos and exports to both CSV and JSON:
from browserbeam import Browserbeam
import json
import csv
client = Browserbeam(api_key="YOUR_API_KEY")
video_urls = [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://www.youtube.com/watch?v=9bZkp7q19f0",
"https://www.youtube.com/watch?v=kJQP7kiw5Fk",
]
extract_code = """
const vd = ytInitialPlayerResponse?.videoDetails;
const mf = ytInitialPlayerResponse?.microformat?.playerMicroformatRenderer;
return {
title: vd?.title,
viewCount: vd?.viewCount,
channel: vd?.author,
lengthSeconds: vd?.lengthSeconds,
category: mf?.category,
publishDate: mf?.publishDate
};
"""
results = []
for url in video_urls:
session = client.sessions.create(
url=url,
proxy={"kind": "residential", "country": "us"},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources=["image", "font", "media"],
auto_dismiss_blockers=True,
)
session.execute_js(extract_code, result_key="video")
video = session.extraction.get("video")
if video:
video["url"] = url
results.append(video)
session.close()
# Save as JSON
with open("youtube_videos.json", "w") as f:
json.dump(results, f, indent=2)
# Save as CSV
if results:
with open("youtube_videos.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=results[0].keys())
writer.writeheader()
writer.writerows(results)
print(f"Saved {len(results)} videos to youtube_videos.json and youtube_videos.csv")
For larger datasets, add a 1-2 second delay between requests to avoid rate limiting. Browserbeam handles session management and proxy rotation automatically, so you don't need to manage a pool of browser instances.
YouTube Data API vs Scraping
The YouTube Data API is the official way to access YouTube data. It's well-documented, returns clean JSON, and is free within quota limits. But those limits are restrictive for research and data science use cases.
| Factor | YouTube Data API | Web Scraping (Browserbeam) |
|---|---|---|
| Daily quota | 10,000 units (1 search = 100 units) | No quota limit |
| Transcripts | Not available | Full timestamped transcripts |
| Comments | Available (costs 1 unit per page of 20) | Available via scroll + observe |
| Video metadata | title, description, views, likes, tags | All API fields + rendered page content |
| Channel videos | Paginated (50/page, costs 1 unit) | 30+ per scroll, unlimited scrolling |
| Authentication | API key required (free) | No YouTube auth needed |
| Rate limiting | Strict (quota resets daily) | Standard web rate limiting |
| Cost | Free (within quota) | Browserbeam credit cost per session |
| Reliability | Official, stable API | Depends on YouTube frontend structure |
| Languages | Any (REST API) | Python, TypeScript, Ruby, cURL |
When to use the Data API: You need fewer than 100 searches per day, don't need transcripts, and want maximum reliability. The API is the right choice for small to medium projects where quota isn't a constraint.
When to scrape: You need transcripts, you're hitting quota limits, or you need data the API doesn't expose (rendered page content, related videos sidebar, comment replies). For NLP research requiring 10,000+ transcripts, scraping is the only practical option.
Third option: yt-dlp. The open-source yt-dlp tool extracts video metadata and subtitles from the command line. It's excellent for one-off tasks and CLI workflows. For programmatic access at scale with proxy management, Browserbeam is more practical.
DIY Scraping vs Browserbeam API
If you've scraped YouTube before, you know the pain points. Here's how a DIY Playwright setup compares to Browserbeam:
| Concern | DIY (Playwright/Selenium) | Browserbeam |
|---|---|---|
| Browser management | Install, update, manage headless Chrome | Managed cloud browsers |
| Proxy rotation | Buy proxies, implement rotation logic | Built-in residential proxies |
| User-Agent spoofing | Manual header management | One parameter: user_agent |
| Cookie consent | Write custom dismiss logic per site | auto_dismiss_blockers: true |
| Resource blocking | Intercept requests manually | block_resources: ["image", "font", "media"] |
| Scroll handling | Write scroll loops with timing | execute_js + wait |
| Scaling | Manage browser pools, memory, crashes | API calls, no infrastructure |
| Transcript extraction | Build click + wait + extract flow | Same flow, but no browser to manage |
The same YouTube scraper in Playwright requires roughly 80 lines of Python (browser launch, context setup, proxy config, stealth plugin, navigation, waiting, extraction, cleanup). The Browserbeam version is 15 lines.
# Playwright equivalent (for comparison)
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
proxy={"server": "http://your-proxy:8080", "username": "user", "password": "pass"}
)
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
viewport={"width": 1280, "height": 720}
)
page = context.new_page()
page.route("**/*.{png,jpg,gif,svg,woff,woff2,mp4,webm}", lambda route: route.abort())
page.goto("https://www.youtube.com/watch?v=dQw4w9WgXcQ", wait_until="networkidle")
page.wait_for_timeout(3000)
# Handle cookie consent manually
try:
page.click("button:has-text('Accept all')", timeout=3000)
except:
pass
# Extract LD+JSON
data = page.evaluate("""() => {
const scripts = document.querySelectorAll('script[type="application/ld+json"]');
for (const s of scripts) {
const d = JSON.parse(s.textContent);
if (d['@type'] === 'VideoObject') return d;
}
}""")
print(data)
browser.close()
The Playwright version needs you to manage browser installation, proxy credentials, route interception for resource blocking, manual cookie consent handling, and cleanup. Browserbeam handles all of that with configuration parameters.
Use Cases
Sentiment Analysis on Product Reviews
Scrape comments from product review videos to build a sentiment classifier. Extract the video transcript for the reviewer's opinion, then pair it with comment sentiment to gauge audience agreement.
# Scrape a product review video + transcript for sentiment analysis
session = client.sessions.create(
url="https://www.youtube.com/watch?v=PRODUCT_REVIEW_ID",
proxy={"kind": "residential", "country": "us"},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
block_resources=["image", "font", "media"],
auto_dismiss_blockers=True,
)
session.execute_js(extract_code, result_key="video")
# ... extract transcript using the interactive flow above ...
# Feed transcript + metadata into your NLP pipeline
session.close()
Content Research Across a Niche
Extract video metadata from the top channels in your niche. Compare upload frequency, average view counts, title patterns, and topics to identify content gaps.
Transcript Corpus for NLP Training
Build a training dataset from video transcripts in a specific domain (cooking, fitness, technology). Loop through channel pages to collect video URLs, then extract transcripts from each. A single channel with 500 videos gives you hundreds of hours of transcribed speech.
Common Mistakes When Scraping YouTube
1. Using the default headless User-Agent
YouTube checks the User-Agent string before rendering any content. The default Playwright/Puppeteer User-Agent contains HeadlessChrome, which triggers an immediate block. Always set a custom User-Agent matching a real Chrome release.
2. Blocking stylesheets on channel pages
Video pages render fine without stylesheets. Channel pages don't. The ytd-rich-item-renderer grid needs CSS to populate its video cards. Block image, font, and media on channel pages, but leave stylesheet alone.
3. Using datacenter proxies
Datacenter IP ranges are well-known and blocked by YouTube regardless of User-Agent. Residential proxies are required for consistent access.
4. Calling execute_js before the page stabilizes
The ytInitialPlayerResponse object is populated during the initial page load. If you call execute_js too early, the object might be incomplete. Either use the observe call first (which waits for page stability) or add a wait step before extracting data.
5. Not handling cookie consent
YouTube shows a cookie consent banner for EU visitors. Without auto_dismiss_blockers: true, the banner covers page content and blocks interaction. Always enable this setting.
Frequently Asked Questions
Is it legal to scrape YouTube?
YouTube's Terms of Service prohibit automated access. However, scraping publicly available metadata (titles, view counts, descriptions) for research purposes falls into a legal gray area similar to web indexing. Courts have generally protected scraping of public data (see hiQ Labs v. LinkedIn). Do not scrape private or authenticated content, and respect rate limits.
Does YouTube block scrapers?
Yes. YouTube uses browser fingerprinting (User-Agent detection), IP reputation scoring, and behavioral analysis. Datacenter proxies and default headless User-Agents are blocked. Residential proxies with a real Chrome User-Agent bypass these checks reliably.
How to get a YouTube video transcript?
Create a Browserbeam session on the video page, expand the description, click "Show transcript," wait for the panel to load, then extract the timestamped segments. You can also check ytInitialPlayerResponse.captions to see which languages are available before attempting extraction.
Can you scrape YouTube with Python?
Yes. Use the Browserbeam Python SDK (pip install browserbeam) with residential proxies and a custom User-Agent. The SDK handles browser management, proxy rotation, and session cleanup. See the code examples throughout this guide.
YouTube API vs web scraping: which should I use?
Use the YouTube Data API for small to medium projects (under 100 daily searches) that don't need transcripts. Use web scraping when you need transcripts, are hitting API quota limits, or need data the API doesn't provide. For CLI one-off jobs, yt-dlp is another option.
Can I scrape YouTube transcripts?
Yes. Transcripts are available on most YouTube videos through the "Show transcript" engagement panel. Auto-generated captions exist for most English-language content. Some creators disable captions, so check availability via ytInitialPlayerResponse.captions before attempting extraction.
How to scrape YouTube without getting blocked?
Two requirements: residential proxies and a modern Chrome User-Agent string. Set proxy: { kind: "residential" } and user_agent to a current Chrome version string. Enable auto_dismiss_blockers for cookie consent. Block image, font, and media resources to reduce detection surface.
What data does YouTube's ytInitialPlayerResponse contain?
Every video page populates a ytInitialPlayerResponse JavaScript object with videoDetails (title, view count, channel name, channel ID, duration in seconds, keywords, short description) and microformat (category, publish date, upload date, full description, available countries). This is the most stable extraction source because it's populated server-side during page load.
Conclusion
We covered four extraction patterns in this guide: observe for rich markdown, execute_js with ytInitialPlayerResponse for structured video data, innerText parsing for channel video grids, and the interactive transcript flow. Each pattern handles a different YouTube data type, and they all share the same foundation: residential proxies, a custom User-Agent, and selective resource blocking.
Try swapping the video URL with any other YouTube video. The same ytInitialPlayerResponse extraction code returns structured data from any video page. For bulk scraping, start with a channel page to collect video URLs, then loop through individual videos with a short delay between requests.
For the complete API reference, check the Browserbeam documentation. The IMDb scraping guide covers similar React challenges and LD+JSON extraction patterns. If you're building multi-site scrapers, the web scraping agent tutorial shows how to chain different sites into a single workflow. The data extraction guide explains Browserbeam's structured extraction in depth.