Extract HTML tables from any webpage and export as CSV or JSON. Free table extractor for developers, analysts, and researchers working with tabular web data.
You found a government dataset, a pricing comparison, or a league standings page with exactly the data you need. It is sitting right there in an HTML table. You try to copy and paste it into a spreadsheet. The columns collapse, the formatting breaks, and you spend twenty minutes cleaning up what should have been a two-second operation.
Writing a script to parse HTML tables is the "proper" solution, but it means pulling in a parser library, handling missing headers, dealing with colspan attributes, and writing CSV serialization logic. For a one-off extraction, that is too much overhead.
This table extractor does the parsing for you. Enter a URL to fetch all tables from a live page, or paste raw HTML to extract tables locally. The tool detects every <table> element, preserves headers and row structure, and lets you download or copy each table as CSV or JSON. No script to write, no dependencies to install.
The tool has two modes. Pick whichever fits your workflow.
<table> element on the page.<table> elements themselves.Paste mode is useful when you already have the HTML saved locally, copied from view-source, or exported from another tool like the HTML Formatter. It also works when you want to keep the data entirely on your machine.
The CSV output follows RFC 4180 conventions. The first row contains column headers if the table has a <thead> section or a row of <th> elements. Subsequent rows match the table body. Values that contain commas, quotes, or newlines are properly escaped with double quotes. The result opens directly in Excel, Google Sheets, or any spreadsheet application.
When headers are present, each row becomes a JSON object with header names as keys. A table with columns "Name", "Price", and "Stock" produces an array of objects like [{"Name": "Widget", "Price": "$9.99", "Stock": "142"}]. When no headers are detected, the output is an array of arrays. Both formats are valid input for pandas, jq, or any data pipeline.
The parser processes HTML tables using the same logic in both URL mode (server-side with Nokogiri) and paste mode (client-side with the browser's DOMParser). Here is how it handles different table structures:
<thead>: If the table has a <thead> section, all <th> cells inside it become column headers.<thead> but the first <tr> contains <th> elements, those cells are treated as headers.<td> cells produce rows without named columns. The CSV has no header row, and the JSON is an array of arrays.<caption> element, the caption text appears as the table title in the results and is used as the filename when downloading.Tables are still the most common way structured data appears on the web. Here are the scenarios where developers and analysts reach for a table extractor most often:
Not every table is a simple grid of headers and rows. Here is how this tool handles the tricky cases:
<table> element. Each table gets its own card in the results with independent export buttons. Pages with dozens of tables (like Wikipedia articles) work fine.| Criteria | CSV | JSON |
|---|---|---|
| Best for | Spreadsheets, Excel, Google Sheets | APIs, scripts, data pipelines |
| File size | Smaller, no structural overhead | Larger due to key names and formatting |
| Column names | First row (if headers exist) | Object keys on every row |
| Nested data | Not supported | Supported natively |
| Programmatic use | Requires parsing with csv library | Ready for JSON.parse() or json.loads() |
| Human readable | Yes, in any text editor | Yes, with indentation |
If you are dropping the data into a spreadsheet or importing it into a database, CSV is the simpler choice. If you are feeding the data into a script, an API, or a tool like jq or pandas, JSON keeps the column names attached to every row and is easier to manipulate programmatically.
<table> elements for layout. If the results include layout tables alongside data tables, look for the table with meaningful headers and row counts. Layout tables usually have 1-2 rows or no headers.<th> elements, the JSON export produces an array of arrays, which preserves the raw row/column structure. You can then assign your own column names in your script.Enter the URL of the page containing the table into this tool and click Extract Tables. Each table found on the page will have a "Copy CSV" and "Download CSV" button. The CSV includes headers (if the table has them) and all data rows. You can open the file directly in Excel, Google Sheets, or any spreadsheet application.
Yes. The tool finds every <table> element on the page and displays each one separately. You can export tables individually or use the "Copy All" buttons to get all tables in a single CSV or JSON file.
The URL mode fetches the server-rendered HTML, so tables that exist in the initial HTML response will be extracted. Tables that are loaded dynamically by JavaScript after page load will not appear. For those, open the page in your browser, right-click the table, select "Inspect", copy the outer HTML, and use paste mode.
No. Paste mode runs entirely in your browser using the built-in DOMParser API. Your HTML never leaves your machine. URL mode sends the URL to our server so we can fetch the page on your behalf, but the HTML is processed and discarded immediately.
Tables without <th> elements or a <thead> section are exported without a header row. The CSV starts directly with data rows. The JSON output is an array of arrays instead of an array of objects. You can add your own headers after importing.
Yes. Every extracted table has both CSV and JSON export options. The JSON format uses column headers as object keys when headers are available, producing output that works directly with JSON.parse() in JavaScript, json.loads() in Python, or command-line tools like jq.
Some websites use <table> elements for page layout rather than data. These layout tables often have few cells or no meaningful content. The tool extracts all tables regardless, so you may see some entries with zero rows. Look for tables with higher row counts and column headers for the actual data tables.
The URL mode fetches the page as an anonymous visitor. If the table is behind authentication, log into the site in your browser, navigate to the page, view the page source (Ctrl+U or Cmd+U), copy the HTML, and paste it into the tool using paste mode. This extracts the table without sending your credentials anywhere.
Structured page data instead of raw HTML. Your agent processes less, decides faster, and costs less to run.