Table Extractor

Why extracting HTML tables is harder than it looks

You found a government dataset, a pricing comparison, or a league standings page with exactly the data you need. It is sitting right there in an HTML table. You try to copy and paste it into a spreadsheet. The columns collapse, the formatting breaks, and you spend twenty minutes cleaning up what should have been a two-second operation.

Writing a script to parse HTML tables is the "proper" solution, but it means pulling in a parser library, handling missing headers, dealing with colspan attributes, and writing CSV serialization logic. For a one-off extraction, that is too much overhead.

This table extractor does the parsing for you. Enter a URL to fetch all tables from a live page, or paste raw HTML to extract tables locally. The tool detects every <table> element, preserves headers and row structure, and lets you download or copy each table as CSV or JSON. No script to write, no dependencies to install.

How to convert HTML tables to CSV or JSON

The tool has two modes. Pick whichever fits your workflow.

Check URL mode

Enter a URL into the input field. Any publicly accessible page works. The tool fetches the page server-side, so JavaScript-rendered content that loads before the initial HTML response will be included.
Click Extract Tables. The tool parses the full HTML document and finds every <table> element on the page.
Review the results. Each table appears in its own card with a preview of headers, rows, and column counts. Tables with captions show the caption as the card title.
Export. Copy as CSV, copy as JSON, or download as a file. Each table has its own export buttons. If the page has multiple tables, you can also export all of them at once.

Paste HTML mode

Switch to "Paste HTML" using the toggle at the top.
Paste your HTML source into the textarea. This can be a full page, a fragment, or just the <table> elements themselves.
Click Extract Tables. Parsing happens entirely in your browser. No data is sent to any server.
Download or copy each table as CSV or JSON.

Paste mode is useful when you already have the HTML saved locally, copied from view-source, or exported from another tool like the HTML Formatter. It also works when you want to keep the data entirely on your machine.

What the CSV and JSON exports contain

CSV export

The CSV output follows RFC 4180 conventions. The first row contains column headers if the table has a <thead> section or a row of <th> elements. Subsequent rows match the table body. Values that contain commas, quotes, or newlines are properly escaped with double quotes. The result opens directly in Excel, Google Sheets, or any spreadsheet application.

JSON export

When headers are present, each row becomes a JSON object with header names as keys. A table with columns "Name", "Price", and "Stock" produces an array of objects like [{"Name": "Widget", "Price": "$9.99", "Stock": "142"}]. When no headers are detected, the output is an array of arrays. Both formats are valid input for pandas, jq, or any data pipeline.

How the HTML table parser works

The parser processes HTML tables using the same logic in both URL mode (server-side with Nokogiri) and paste mode (client-side with the browser's DOMParser). Here is how it handles different table structures:

Headers from <thead>: If the table has a <thead> section, all <th> cells inside it become column headers.
Headers from first row: If there is no <thead> but the first <tr> contains <th> elements, those cells are treated as headers.
No headers: Tables with only <td> cells produce rows without named columns. The CSV has no header row, and the JSON is an array of arrays.
Captions: If the table has a <caption> element, the caption text appears as the table title in the results and is used as the filename when downloading.
Whitespace: Cell text is trimmed and collapsed. Multiple spaces and line breaks inside a cell become a single space.

Common use cases for HTML table extraction

Tables are still the most common way structured data appears on the web. Here are the scenarios where developers and analysts reach for a table extractor most often:

Government and public data. Census tables, regulatory filings, and legislative records are frequently published as HTML tables without a downloadable CSV option. This tool bridges that gap.
Pricing and product comparisons. Competitor pricing pages, feature comparison tables, and spec sheets are easy to extract and drop into a spreadsheet for analysis.
Sports statistics and rankings. League tables, player stats, and historical records published on sports sites often sit in well-structured HTML tables that convert cleanly to CSV.
Financial data. Earnings reports, stock tables, and economic indicators published on news sites and portals can be extracted without writing a custom scraper.
Academic and research data. Published study results, survey data, and reference tables in HTML papers or documentation can be exported for reuse in your own analysis.
Web scraping validation. If you are building a scraper that targets tables, this tool lets you quickly verify what the parser will see before you commit to code.

Handling complex HTML tables

Not every table is a simple grid of headers and rows. Here is how this tool handles the tricky cases:

Multiple tables on one page. The tool finds and extracts every <table> element. Each table gets its own card in the results with independent export buttons. Pages with dozens of tables (like Wikipedia articles) work fine.
Nested tables. Tables inside other tables are extracted as separate entries. The outer table's cells will contain the text content of the inner table, while the inner table appears as its own result.
Missing cells. When rows have fewer cells than the header count, the missing values appear as empty strings in the CSV and JSON output. This keeps columns aligned.
Large tables. Tables with hundreds of rows show a preview of the first 10 rows by default. Click "Show all rows" to expand. Exports always include all rows regardless of the preview state.

HTML table to CSV vs. HTML table to JSON: which format to use

Criteria	CSV	JSON
Best for	Spreadsheets, Excel, Google Sheets	APIs, scripts, data pipelines
File size	Smaller, no structural overhead	Larger due to key names and formatting
Column names	First row (if headers exist)	Object keys on every row
Nested data	Not supported	Supported natively
Programmatic use	Requires parsing with csv library	Ready for `JSON.parse()` or `json.loads()`
Human readable	Yes, in any text editor	Yes, with indentation

If you are dropping the data into a spreadsheet or importing it into a database, CSV is the simpler choice. If you are feeding the data into a script, an API, or a tool like jq or pandas, JSON keeps the column names attached to every row and is easier to manipulate programmatically.

Tips for getting clean table data

Check for JavaScript-rendered tables. Some websites load table data via JavaScript after the initial page load. The URL mode fetches the raw HTML response, which means tables injected by client-side scripts will not appear. In those cases, view the page source in your browser, copy the rendered HTML, and use paste mode instead.
Filter by table index. Pages with navigation menus, sidebars, or footers sometimes use <table> elements for layout. If the results include layout tables alongside data tables, look for the table with meaningful headers and row counts. Layout tables usually have 1-2 rows or no headers.
Clean headers before importing. Some tables use merged header cells or multi-line headers. The tool extracts the text content, but you may need to rename columns after importing the CSV into your spreadsheet.
Use JSON for tables without headers. If a table has no <th> elements, the JSON export produces an array of arrays, which preserves the raw row/column structure. You can then assign your own column names in your script.
Validate with the HTML Formatter. If extraction results look wrong, paste the source HTML into the HTML Formatter first. Pretty-printed HTML makes it easy to spot malformed table markup, missing closing tags, or nested tables that affect parsing.

Frequently Asked Questions

How do I convert an HTML table to CSV?

Enter the URL of the page containing the table into this tool and click Extract Tables. Each table found on the page will have a "Copy CSV" and "Download CSV" button. The CSV includes headers (if the table has them) and all data rows. You can open the file directly in Excel, Google Sheets, or any spreadsheet application.

Can I extract multiple tables from a single page?

Yes. The tool finds every <table> element on the page and displays each one separately. You can export tables individually or use the "Copy All" buttons to get all tables in a single CSV or JSON file.

Does this tool handle tables rendered by JavaScript?

The URL mode fetches the server-rendered HTML, so tables that exist in the initial HTML response will be extracted. Tables that are loaded dynamically by JavaScript after page load will not appear. For those, open the page in your browser, right-click the table, select "Inspect", copy the outer HTML, and use paste mode.

Is my data sent to a server when I use paste mode?

No. Paste mode runs entirely in your browser using the built-in DOMParser API. Your HTML never leaves your machine. URL mode sends the URL to our server so we can fetch the page on your behalf, but the HTML is processed and discarded immediately.

What happens if a table has no headers?

Tables without <th> elements or a <thead> section are exported without a header row. The CSV starts directly with data rows. The JSON output is an array of arrays instead of an array of objects. You can add your own headers after importing.

Can I convert HTML table to JSON instead of CSV?

Yes. Every extracted table has both CSV and JSON export options. The JSON format uses column headers as object keys when headers are available, producing output that works directly with JSON.parse() in JavaScript, json.loads() in Python, or command-line tools like jq.

Why are some tables showing as empty?

Some websites use <table> elements for page layout rather than data. These layout tables often have few cells or no meaningful content. The tool extracts all tables regardless, so you may see some entries with zero rows. Look for tables with higher row counts and column headers for the actual data tables.

How do I extract a table from a page that requires login?

The URL mode fetches the page as an anonymous visitor. If the table is behind authentication, log into the site in your browser, navigate to the page, view the page source (Ctrl+U or Cmd+U), copy the HTML, and paste it into the tool using paste mode. This extracts the table without sending your credentials anywhere.

Related Free Tools

HTML Formatter & Beautifier -- Format and pretty-print HTML before extracting tables to inspect table structure and catch malformed markup.
XPath Tester -- Write XPath expressions to target specific tables or rows within complex HTML documents.
Email Extractor -- Extract email addresses from the same pages where you find tables. Useful for contact directories.
JSON Formatter & Validator -- Format and validate the JSON output from table extraction before feeding it into your data pipeline.
Curl Command Generator -- Convert curl commands to Python, JavaScript, Ruby, or PHP for fetching pages with tables programmatically.
Bulk HTTP Status Checker -- Verify that the URLs containing your target tables are alive and returning 200 before building extraction pipelines.