
By the end of this guide, you'll have working code that converts any URL or HTML template into a pixel-perfect PDF. We'll cover page sizes, margins, custom fonts, page breaks, and the patterns you actually need for invoices, reports, and web archiving.
Generating PDFs from web pages sounds simple. You have HTML, you want a PDF, the browser already renders both. In practice, every team that builds this hits the same wall: pure JavaScript libraries like jsPDF and html2canvas can't handle modern CSS Grid, web fonts, or charts. wkhtmltopdf was archived in 2023. Puppeteer and Playwright work, but you're suddenly running headless Chrome in production, managing memory, and patching Chromium every two weeks.
The cleanest path in 2026 is to use a real browser, but offload the operations work. That's what this guide shows: how to call one API method to get a PDF back, when you should still reach for Puppeteer or WeasyPrint instead, and the formatting tricks that separate a usable PDF from one that prints with white backgrounds and broken page breaks.
What you'll learn in this guide:
- How to convert any URL to PDF with a single API call
- How to generate PDFs from custom HTML (invoices, reports, contracts)
- How to control page size, margins, scale, landscape mode, and print backgrounds
- How to add headers, footers, and page numbers using CSS
@pagerules - How to handle page breaks so tables and sections don't split awkwardly
- Three real-world patterns: invoice generation, analytics reports, web archiving
- How wkhtmltopdf, Puppeteer, Playwright, WeasyPrint, and Browserbeam compare
- A decision framework for picking the right HTML-to-PDF tool
TL;DR: The reliable way to convert HTML to PDF in 2026 is to use a real browser engine. Pure-JS converters break on modern CSS. wkhtmltopdf was archived. Self-hosted Puppeteer and Playwright work but require ops effort. A cloud browser API like Browserbeam gives you the same Chromium rendering with one HTTP call, no infrastructure. This guide covers the API call, formatting options, real-world patterns, and how each tool stacks up.
What Is HTML to PDF Conversion?
HTML to PDF conversion takes web content (a URL, a raw HTML string, or a server-rendered template) and turns it into a Portable Document Format file you can download, email, archive, or print. The PDF preserves layout, fonts, colors, and page breaks the way a browser would render them when you press Cmd+P and choose "Save as PDF".
Why Pure-JavaScript Converters Fall Short
Libraries like jsPDF and html2canvas run entirely in the browser and rebuild the page using a <canvas> element. That worked in 2014 when web pages were tables and inline styles. It struggles in 2026 with:
- CSS Grid and Flexbox layouts (canvas rasterization loses the layout engine)
- Web fonts loaded via
@font-face(the canvas may render before fonts arrive) - SVG charts and dynamic content (timing issues, missing animations)
- High-DPI displays (rasterized output looks blurry on retina screens)
- Multi-page pagination (canvas is one big image, not paginated text)
If you've tried html2canvas on a modern dashboard, you already know: the output looks "almost right" except for the chart, the icon font, and three lines that wrap differently. There's no way to fix it from inside the browser.
Why Browser-Rendered PDFs Win
Real browsers (Chrome, Firefox, Safari) implement the full CSS specification and a print-specific rendering pipeline. When you ask a browser to print a page, it runs the same layout engine that paints to your screen, then outputs vector PDF instead of pixels. Web fonts, SVG, Grid, charts, custom CSS pagination, all preserved.
This is why every modern HTML-to-PDF tool with serious quality (Puppeteer, Playwright, Browserbeam, WeasyPrint, Prince, DocRaptor) uses either Chromium or a print-focused rendering engine under the hood.
When You Need a Browser-Rendered PDF
Not every HTML-to-PDF job needs a full browser. Here's how to decide.
Use a Browser-Based Tool When
- The page uses CSS Grid, Flexbox, or modern layout features
- Web fonts matter for branding (custom typefaces, icon fonts)
- The HTML loads data via JavaScript before rendering
- The document contains charts (Chart.js, D3, ECharts, Recharts)
- You need crisp vector text, not rasterized canvas output
- You generate invoices, receipts, or contracts where pixel precision counts
A Lighter Tool May Be Enough When
- The HTML is plain server-rendered markup with no JavaScript
- Layout is purely block flow (no Grid, minimal Flexbox)
- You're converting Markdown or RST and don't need fancy CSS
- File size matters more than fidelity (tools like WeasyPrint produce smaller files)
For straightforward server-side HTML, WeasyPrint (Python, pure CSS rendering, no browser) is faster and lighter. For anything with JavaScript or modern layout, you want a real browser.
How Browser-Based PDF Generation Works
Under the hood, browser-based PDF generation is the same pipeline that paints pages to your screen, with one extra step at the end.
The Print Pipeline
The browser parses HTML into a DOM tree and CSS into a CSSOM. The two combine into a render tree. Layout calculates the position and size of every box. Paint usually rasterizes to your screen, but for print it follows the @media print stylesheet and outputs vector instructions. The Chrome DevTools Protocol method Page.printToPDF is what every browser-based PDF tool calls under the hood.
Why This Matters
Knowing the pipeline tells you where to fix things:
- Wrong fonts? You're hitting the parse stage. Make sure
@font-faceURLs are absolute and the file loads before printing. - Missing colors? You're losing print backgrounds. Set
print_background: true. - Awkward page breaks? You're at the paint stage. Add
page-break-inside: avoidorbreak-inside: avoid. - Layout shift between screen and PDF? Use a
@media printblock to override screen-only rules.
Quick Start: Convert Any URL to PDF
Here's the shortest possible HTML-to-PDF script. Replace YOUR_API_KEY and run it.
Don't have an API key yet? Create a free Browserbeam account. You get 5,000 credits, no credit card required.
curl -X POST https://api.browserbeam.com/v1/sessions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"steps": [
{ "pdf": { "format": "A4", "print_background": true } },
{ "close": {} }
]
}' | jq -r ".media[0].data" | base64 --decode > hn.pdf
import base64
from browserbeam import Browserbeam
client = Browserbeam(api_key="YOUR_API_KEY")
session = client.sessions.create(url="https://news.ycombinator.com")
result = session.pdf(format="A4", print_background=True)
pdf_bytes = base64.b64decode(result.media[0].data)
with open("hn.pdf", "wb") as f:
f.write(pdf_bytes)
session.close()
import Browserbeam from "@browserbeam/sdk";
import { writeFileSync } from "node:fs";
const client = new Browserbeam({ apiKey: "YOUR_API_KEY" });
const session = await client.sessions.create({ url: "https://news.ycombinator.com" });
const result = await session.pdf({ format: "A4", print_background: true });
const pdfBytes = Buffer.from(result.media[0].data, "base64");
writeFileSync("hn.pdf", pdfBytes);
await session.close();
require "browserbeam"
require "base64"
client = Browserbeam::Client.new(api_key: "YOUR_API_KEY")
session = client.sessions.create(url: "https://news.ycombinator.com")
result = session.pdf(format: "A4", print_background: true)
pdf_bytes = Base64.decode64(result.media[0]["data"])
File.binwrite("hn.pdf", pdf_bytes)
session.close
That's the whole flow: create a session at a URL, call pdf, decode the base64 result, write to disk. The PDF preserves the exact rendering Chrome would produce.
Setting Up the Project
For the rest of this guide, we'll use Python. The same patterns translate directly to TypeScript and Ruby; only the syntax changes.
Install the SDK
mkdir html-to-pdf && cd html-to-pdf
python3 -m venv venv
source venv/bin/activate
pip install browserbeam
Set the API Key
Export your API key as an environment variable so it's not hardcoded:
export BROWSERBEAM_API_KEY="bb_live_..."
The SDK picks this up automatically:
from browserbeam import Browserbeam
client = Browserbeam() # reads BROWSERBEAM_API_KEY from env
Project Layout
For the patterns later in this guide, we'll organize files like this:
html-to-pdf/
generate.py # Single URL -> PDF
custom_html.py # Custom HTML template -> PDF
invoice/
app.py # Flask invoice template server
templates/
invoice.html
out/ # Generated PDFs land here
Generating a PDF from a URL
The simplest case: you have a URL, you want a PDF. Here's the full Python module.
import base64
from pathlib import Path
from browserbeam import Browserbeam
def url_to_pdf(url: str, output_path: str, format: str = "A4") -> Path:
"""Render a URL to a PDF file. Returns the output path."""
client = Browserbeam()
session = client.sessions.create(url=url)
try:
result = session.pdf(format=format, print_background=True)
pdf_bytes = base64.b64decode(result.media[0].data)
out = Path(output_path)
out.parent.mkdir(parents=True, exist_ok=True)
out.write_bytes(pdf_bytes)
return out
finally:
session.close()
if __name__ == "__main__":
out = url_to_pdf("https://news.ycombinator.com", "out/hn.pdf")
print(f"Saved {out} ({out.stat().st_size:,} bytes)")
Run it and you'll get a PDF in out/hn.pdf. Open it and you'll see Hacker News rendered exactly as Chrome would print it, with all the orange branding intact (because we set print_background=True).
Pro tip: Always wrap the API call in try/finally and call session.close() in the finally block. Open sessions consume runtime credits until they expire. If your script crashes between create and close, you keep paying for the session until the timeout hits.
What's in result.media
The pdf() call returns a SessionEnvelope. Its media attribute is a list of media items. For PDF requests:
{
"type": "pdf",
"format": "pdf",
"data": "JVBERi0xLjQKJeLjz9MK..." # base64-encoded PDF bytes
}
The base64 string starts with JVBERi0x because that's how %PDF-1 looks after base64 encoding. If you decode it correctly, the first four bytes of the binary are %PDF.
Pro tip: If your decoded file doesn't start with %PDF-, you forgot the base64 step. Open the file with head -c 8 and you'll see the issue immediately.
Generating a PDF from Custom HTML
Most real use cases convert your own HTML, not a public URL. Invoice templates, generated reports, contracts. The Browserbeam API only accepts http:// and https:// URLs with a real domain (no data: URLs, no localhost, no IP addresses), so the pattern is: serve your HTML at a publicly reachable URL, then point the API at it.
Run a Web Server with Templates
Render your HTML with whatever templating engine you already use — Jinja2, Handlebars, ERB — and serve it from a small web server. Here's a Flask example:
from flask import Flask, render_template
app = Flask(__name__)
@app.route("/invoice/<int:invoice_id>")
def invoice(invoice_id):
invoice_data = {
"id": invoice_id,
"customer": "Acme Corp",
"items": [
{"name": "Cloud subscription", "price": 99.00},
{"name": "Premium support", "price": 49.00},
],
"total": 148.00,
}
return render_template("invoice.html", invoice=invoice_data)
if __name__ == "__main__":
app.run(port=5000)
Save your invoice template at templates/invoice.html.
Make the URL Publicly Reachable
Browserbeam runs in the cloud, so it can't reach http://localhost:5000. You have three realistic choices:
- Production: Deploy the template service to Render, Fly, Vercel, ECS, or any other public host, then call it by its real domain.
- Local development: Run a tunnel like
ngrok http 5000orcloudflared tunnel --url http://localhost:5000. You'll get a public HTTPS URL likehttps://abc123.ngrok-free.appthat proxies to your local server. - Co-located deployment: Run both your Flask app and the Browserbeam API call from the same VPC and put them on a private public-DNS subdomain.
Once your template service has a public URL, generate the PDF the same way as any other URL:
from generate import url_to_pdf
url_to_pdf(
"https://invoices.acme.com/invoice/123",
"out/invoice-123.pdf",
)
Pro tip: Server-side templating gives you clean separation between data (the database), presentation (the template), and rendering (the PDF API). Templates can use any framework's full feature set, including loops, conditionals, custom CSS, and JavaScript. The downside is operational: you now own a small web service. If that tradeoff is wrong for your team, an HTML-to-PDF API that accepts an HTML payload directly (DocRaptor, PDFShift) is a simpler path.
PDF Formatting Options
The pdf() method takes a handful of parameters that match the Chrome DevTools Protocol's printToPDF. These cover ~95% of what you need.
Parameter Reference
| Parameter | Type | Default | Use Case |
|---|---|---|---|
format |
string | "A4" |
Page size. Accepted values: "A4", "Letter", "Legal" |
landscape |
boolean | false |
Rotate pages 90 degrees for wide content |
print_background |
boolean | true |
Include CSS background colors and images. Set to false if you specifically want backgrounds dropped |
scale |
number | 1.0 |
Zoom factor, 0.1 to 2.0 (use 0.8 to fit more on a page) |
margin |
object | browser default | { top, right, bottom, left } as CSS strings: "1cm", "0.5in", "20px" |
Page Size Examples
session.pdf(format="A4") # 210 x 297 mm (default)
session.pdf(format="Letter") # 8.5 x 11 in (US standard)
session.pdf(format="Legal") # 8.5 x 14 in (long contracts)
These three are the formats currently accepted by the pdf step. If you need a different paper size, use CSS @page { size: ... } inside your HTML to control the rendered page geometry while keeping the API call to one of the three above.
Landscape for Wide Content
Reports with wide tables or dashboards usually want landscape:
session.pdf(format="A4", landscape=True, print_background=True)
Custom Margins
Default margins are about 1 cm on all sides. Override them when you need a tighter layout or a specific brand spec:
session.pdf(
format="A4",
print_background=True,
margin={"top": "20mm", "right": "15mm", "bottom": "20mm", "left": "15mm"},
)
For zero-margin printing (your CSS handles all the spacing):
session.pdf(
format="A4",
print_background=True,
margin={"top": "0", "right": "0", "bottom": "0", "left": "0"},
)
Scale to Fit More Content
If a table is one column too wide, scale slightly down instead of changing the layout:
session.pdf(format="A4", scale=0.85, print_background=True)
A scale of 0.85 fits roughly 18% more content per page. Anything below 0.7 makes text hard to read.
Pro tip: print_background defaults to true, but be aware that some other PDF tools (notably Puppeteer's raw page.pdf()) default to false. If you migrate from a setup that explicitly passed print_background=False, your branded backgrounds will start appearing again — usually what you want, but worth noticing.
Headers, Footers, and Page Breaks
Once your PDF spans more than one page, you need control over what repeats at the top and bottom of each page, and where the content breaks.
Page Breaks with CSS
Use the standard CSS print properties in your stylesheet:
/* Force a break before this section */
.chapter {
break-before: page;
}
/* Avoid breaking inside this element (table rows, cards) */
.invoice-line {
break-inside: avoid;
}
/* Allow a break after */
.summary {
break-after: auto;
}
The legacy properties page-break-before, page-break-after, page-break-inside still work and have wider browser support. Modern browsers honor both.
Headers and Footers via @page
CSS @page rules let you add running headers and footers without per-page positioning code:
@page {
size: A4;
margin: 25mm 20mm;
@top-center {
content: "Acme Corporation - Invoice";
font-size: 10pt;
color: #6b7280;
}
@bottom-right {
content: "Page " counter(page) " of " counter(pages);
font-size: 9pt;
color: #9ca3af;
}
}
Chromium has partial support for @page margin boxes. For full header/footer customization, render a <div class="page-header"> at the top of your HTML and use:
.page-header {
position: running(header);
}
@page {
@top-center {
content: element(header);
}
}
If Chromium's @page rendering misses an edge case, fall back to inline footer rows in your template. Less elegant, but always works.
Pro tip: Build your PDF first without page breaks, then add them surgically only where the default flow looks wrong. Over-aggressive break-inside: avoid can leave huge empty spaces at the bottom of pages when content doesn't fit cleanly.
Inline Page Numbers
For documents where headers via @page don't render the way you want, add inline page numbers using CSS counters in a fixed-position element. The simplest approach is to commit to whatever Chromium provides natively, accept that page-number rendering varies across PDF viewers, and use the @page pattern above.
Real-World Patterns
Three patterns cover most real HTML-to-PDF jobs. Each is a fully working Python script.
Pattern 1: Invoice Generator
The classic use case. Render an invoice template with customer data, save the PDF, optionally email or upload it.
import base64
from pathlib import Path
from browserbeam import Browserbeam
def generate_invoice(invoice_id: int, base_url: str) -> Path:
"""Render an invoice template to PDF and save to disk.
`base_url` must be a publicly reachable HTTPS URL of your invoice
service (a deployed Flask app, an ngrok tunnel during dev, etc.)."""
client = Browserbeam()
url = f"{base_url}/invoice/{invoice_id}"
session = client.sessions.create(url=url)
try:
result = session.pdf(
format="A4",
print_background=True,
margin={"top": "20mm", "bottom": "20mm", "left": "15mm", "right": "15mm"},
)
pdf_bytes = base64.b64decode(result.media[0].data)
out = Path(f"out/invoice-{invoice_id}.pdf")
out.parent.mkdir(parents=True, exist_ok=True)
out.write_bytes(pdf_bytes)
return out
finally:
session.close()
if __name__ == "__main__":
path = generate_invoice(123, base_url="https://invoices.acme.com")
print(f"Generated {path} ({path.stat().st_size:,} bytes)")
For production, drop the file into S3 or the equivalent and email a signed URL to the customer.
Pattern 2: Analytics Report Export
Dashboards built with Chart.js, ECharts, or Recharts render to PDF cleanly because the browser executes the JavaScript that draws the charts.
import base64
from datetime import date
from pathlib import Path
from browserbeam import Browserbeam
def export_dashboard(dashboard_url: str) -> Path:
"""Export a JS-rendered analytics dashboard as a landscape A4 PDF."""
client = Browserbeam()
session = client.sessions.create(url=dashboard_url)
try:
result = session.pdf(
format="A4",
landscape=True,
print_background=True,
scale=0.9,
)
pdf_bytes = base64.b64decode(result.media[0].data)
out = Path(f"out/report-{date.today().isoformat()}.pdf")
out.parent.mkdir(parents=True, exist_ok=True)
out.write_bytes(pdf_bytes)
return out
finally:
session.close()
A4 landscape with a scale of 0.9 fits a typical KPI dashboard cleanly. Pair this with a cron job to email a daily PDF report to stakeholders. If your dashboard is genuinely too wide for A4, render it at a larger viewport (set viewport: { width: 1600, height: 1200 } when you create the session) and let the scale parameter compensate.
Pattern 3: Web Archiving
Sometimes you just want a PDF snapshot of a public web page for compliance, research, or evidence:
import base64
import re
from datetime import datetime
from pathlib import Path
from browserbeam import Browserbeam
def archive_page(url: str) -> Path:
"""Save a public webpage as a dated PDF for archiving."""
client = Browserbeam()
session = client.sessions.create(url=url)
try:
result = session.pdf(format="A4", print_background=True)
pdf_bytes = base64.b64decode(result.media[0].data)
title = (session.page.title or "untitled").strip()[:60]
slug = re.sub(r"[^a-zA-Z0-9]+", "-", title).strip("-").lower()
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
out = Path(f"archive/{timestamp}-{slug}.pdf")
out.parent.mkdir(parents=True, exist_ok=True)
out.write_bytes(pdf_bytes)
return out
finally:
session.close()
if __name__ == "__main__":
archive_page("https://news.ycombinator.com")
The filename includes the timestamp and a slug derived from the page title, so an archive/ directory naturally sorts by date and stays human-readable.
HTML to PDF Library Comparison
There are a lot of HTML-to-PDF tools. Here's how the major ones stack up in 2026.
The Field
| Tool | Type | Languages | JS Support | Active? | Best For |
|---|---|---|---|---|---|
| wkhtmltopdf | Self-hosted CLI | Bindings for most languages | Limited (old WebKit) | Archived 2023 | Legacy systems, simple HTML |
| Puppeteer | Self-hosted Node library | Node.js | Full Chromium | Active | Custom Node pipelines |
| Playwright | Self-hosted, multi-language | JS, Python, Java, .NET | Full Chromium | Active | Cross-language teams |
| WeasyPrint | Pure Python library | Python | None (no JS engine) | Active | Server-rendered HTML, smaller output |
| Browserbeam | Cloud API | cURL, Python, JS, Ruby | Full Chromium | Active | No-ops cloud rendering |
| DocRaptor | Commercial cloud API | All | Yes (PrinceXML) | Active | Enterprise document quality |
| PDFShift | Commercial cloud API | All | Yes (Chromium) | Active | Simple SaaS billing |
Key Tradeoffs
- wkhtmltopdf was archived on GitHub in 2023. Its Chromium fork is years out of date, so modern CSS Grid, web fonts, and JavaScript are unreliable. New projects should pick something else.
- Puppeteer is the canonical headless Chrome library. Powerful, but you manage Chromium binaries, memory, and crashes. Ops effort scales with traffic.
- Playwright is the same shape as Puppeteer with a cross-language SDK and Microsoft backing. Same self-hosting tradeoffs.
- WeasyPrint is a treat for pure server-rendered HTML where you don't need JavaScript. Pure Python, fast, deterministic. Falls over on JS-heavy or charts-heavy content.
- Browserbeam runs Chromium in the cloud, so your code is one API call. No browser binaries, no memory tuning. You pay for runtime.
- DocRaptor uses PrinceXML under the hood, which produces unusually clean print typography. Enterprise pricing.
For most teams in 2026, the choice is between Puppeteer/Playwright (self-hosted, full control, ops effort) and a cloud API like Browserbeam (no ops, predictable cost).
Common Mistakes When Generating PDFs
These five mistakes account for most of the "my PDF looks wrong" tickets we've seen.
1. Passing print_background=False Out of Habit
The Browserbeam default is true, but Puppeteer's raw page.pdf() defaults to false. Teams porting code from a Puppeteer setup sometimes carry over an explicit printBackground: false and lose every CSS background-color, background-image, and gradient — branded headers turn white, charts lose their fills. Drop the override or set it to True explicitly.
2. Treating the Base64 Data as Raw Bytes
The data field in media[0] is a base64 string. Writing it directly to a .pdf file produces a corrupt file. Always decode first:
import base64
pdf_bytes = base64.b64decode(result.media[0].data)
Path("out.pdf").write_bytes(pdf_bytes)
If you open the resulting file and it doesn't start with %PDF-, you skipped the decode step.
3. Custom Fonts Failing to Load
Web fonts loaded via @font-face need absolute URLs and enough time to load before the browser prints. Two fixes:
- Use absolute URLs:
src: url('https://your-cdn.com/font.woff2'), not relative paths - Add
font-display: blockto make the browser wait for the font instead of rendering with a fallback
If fonts still don't load, the issue is timing. Browserbeam waits for network idle by default, which usually catches @font-face requests. For self-hosted Puppeteer, add an explicit await page.evaluateHandle('document.fonts.ready') before calling pdf().
4. Long Tables Splitting Mid-Row
By default, the browser will break a row in half if it falls at a page boundary. Add CSS:
tr {
break-inside: avoid;
}
.invoice-summary {
break-before: page;
}
Test with a deliberately long table to make sure the breaks land where you expect.
5. Forgetting to Close Sessions in Batch Jobs
If you're generating 500 invoices in a loop and don't call session.close() between iterations, you stack up open sessions and burn runtime credits. Use a try/finally or a context manager pattern. For batch jobs, also consider reusing one session and navigating to each URL with session.goto(url) instead of creating a new session per PDF.
When to Use Each Tool (Decision Framework)
Here's a quick decision tree for picking the right HTML-to-PDF tool.
Quick Decision Matrix
| Your situation | Best fit |
|---|---|
| Pure Python, no JS in templates, smaller files matter | WeasyPrint |
| Already running Node.js, want full control, can manage Chromium | Puppeteer |
| Cross-language team, full Chromium, can manage browser ops | Playwright |
| Want zero ops, pay-as-you-go, any language | Browserbeam |
| Enterprise contracts, premium typography, big budget | DocRaptor (PrinceXML) |
| Need to drop legacy wkhtmltopdf, looking for a one-line replacement | Browserbeam or PDFShift |
| One-off conversion, no infrastructure | PDFShift, Browserbeam |
When You Outgrow Each Option
- Outgrow WeasyPrint when your templates start using charts or JS-heavy components
- Outgrow Puppeteer/Playwright when ops time spent on browser maintenance exceeds the cost of a cloud API
- Outgrow a cloud API when you generate millions of PDFs per month and self-hosted economics flip in your favor
For most teams under 100k PDFs per month, the math favors a cloud API. For teams above that, it depends on how predictable your traffic is and whether you have a platform team that already runs browser infrastructure for other reasons.
If you're comparing cloud browser APIs specifically, check the cloud browser APIs compared guide and the Puppeteer vs Playwright vs Browserbeam comparison for a deeper feature breakdown.
Frequently Asked Questions
How do I convert HTML to PDF?
The reliable path in 2026 is to use a real browser engine. Either run headless Chrome yourself with Puppeteer or Playwright, use a pure-Python alternative like WeasyPrint for simple HTML, or call a cloud API like Browserbeam that handles the browser for you. Pure-JavaScript libraries like jsPDF struggle with modern CSS. Pick the tool based on your language, ops capacity, and how complex your HTML is.
What is the best HTML to PDF library?
There's no single best library. For Python with simple HTML, WeasyPrint is excellent. For Node with full control, Puppeteer is the canonical choice. For cross-language teams, Playwright covers JS, Python, Java, and .NET. For zero-ops cloud rendering, Browserbeam covers cURL, Python, TypeScript, and Ruby. The "best" is whichever matches your stack and ops capacity.
Can Puppeteer generate PDF files?
Yes. Puppeteer's page.pdf() method calls Chrome's Page.printToPDF and returns a PDF buffer. The catch is operations: you're now running Chromium in production, managing memory, handling crashes, and patching the browser regularly. Puppeteer is great if you already run Node.js infrastructure. If you don't, a cloud API offloads the ops burden.
How do I generate a PDF from a webpage?
Send the URL to a tool that runs a real browser, render the page, and ask the browser to print to PDF. The shortest path is one API call to a cloud browser like Browserbeam: session = client.sessions.create(url=url); pdf = session.pdf(format="A4", print_background=True); session.close(). The result is a base64-encoded PDF you decode and save.
Is wkhtmltopdf still maintained?
No. wkhtmltopdf was archived on GitHub in 2023. Its Chromium fork is years out of date, which causes problems with modern CSS Grid, web fonts, and JavaScript-rendered content. The maintainers' archive note on GitHub recommends migrating to Chromium-based alternatives. Puppeteer, Playwright, and Browserbeam are the most direct replacements.
How do you add page breaks in HTML to PDF?
Use CSS print properties. break-before: page forces a new page before an element. break-inside: avoid prevents the browser from splitting an element (like a table row or invoice card) across pages. The legacy page-break-* properties still work in all browsers. Add these rules in a @media print block or in your main stylesheet if the page is print-only.
Can HTML to PDF handle JavaScript-rendered content?
Yes, if you use a real browser. Puppeteer, Playwright, and Browserbeam all execute JavaScript before generating the PDF, so charts, dynamic data, and SPA frameworks render correctly. WeasyPrint and the archived wkhtmltopdf do not run JavaScript. If your template is a single-page app or pulls data via fetch, you need a Chromium-based tool.
How do I convert HTML to PDF in Python?
Three solid options. WeasyPrint (pure Python, no JavaScript) handles server-rendered HTML well. Playwright's Python binding gives you full headless Chrome but requires you to install browser binaries. The Browserbeam Python SDK is one pip install and one API call: session.pdf(format="A4", print_background=True). The right pick depends on whether your HTML uses JavaScript and whether you want to run a browser locally. The web scraping with Python guide covers the wider ecosystem of Python web automation tools if you're new to the space.
Start Generating PDFs Today
You now have the patterns to handle every common HTML-to-PDF job: URL-to-PDF in one API call, custom HTML via templates, page formatting with margins and scale, headers and page breaks via CSS, and three real-world examples covering invoices, reports, and archiving.
Pick the simplest tool that handles your target. For pure server-rendered HTML, WeasyPrint is hard to beat. For JavaScript-heavy or chart-rich pages, you want a real browser. If you don't want to run that browser yourself, Browserbeam's PDF API gives you the same Chromium rendering with no infrastructure to manage. For deeper SDK examples, the Python SDK getting started guide covers screenshots, extraction, and proxies alongside PDF generation. If you also need to extract structured data from the same pages you're rendering, the structured web scraping guide shows how to do both in one session. For production scaling patterns (concurrent generation, retries, cost management), the scaling web automation guide is the next stop.
What will you generate first?