PDF to Excel
Extract PDF data into spreadsheet-ready output.
Convert PDF to Excel
Upload one PDF and export its readable text into a `.xlsx` workbook with one row per PDF page.
Usage notes
Available- One source PDF per request
- Exports readable text into XLSX rows
- Locked PDFs must be unlocked first
- Best for digital text PDFs, not scans
Pull tables out of a PDF as real, editable Excel data
Getting a table out of a PDF and into Excel is one of the most consistently frustrating document tasks. Copy-paste usually works for text but turns every cell into a single smashed-together string. Screenshot-and-type is reliable but slow. Real conversion — the kind that puts each cell in its own spreadsheet cell — requires a tool that actually understands table structure.
The OkFarsi PDF to Excel tool analyzes each PDF page for table layouts: it detects rows by looking for horizontal alignment, detects columns by looking for vertical alignment, and extracts each cell as a separate value. Numeric cells are typed as numbers (so Excel can sum them), date cells are typed as dates (so Excel can sort them), and plain-text cells stay as text. The result is an XLSX where each detected table lands on its own sheet, with headers preserved and data ready to filter, pivot, or formula-edit.
Where this gets harder is non-gridded tables and scanned PDFs. A born-digital PDF with visible table borders is the easy case — the converter extracts a near-perfect Excel representation. A PDF where the "table" is just text aligned with whitespace and no visible borders is harder; the converter uses column-detection heuristics that work well most of the time but may misread merged cells or uneven spacing. A scanned PDF has no text structure at all — run OCR first, and even then expect that the OCR'd "table" will need manual cleanup in Excel.
How to extract PDF tables into Excel
- Check the table structure in the source PDF
Open the PDF and look at the table you want to extract. Does it have visible gridlines? Is every row the same height? Are columns clearly aligned? The cleaner the source structure, the cleaner the Excel output. Complex merged cells and irregular rows translate poorly regardless of tool.
- Run OCR first if the PDF is a scan
Scanned PDFs have no text data for the converter to read. Use the OCR PDF tool first to add a text layer. Even with OCR, scan-derived tables usually need more cleanup than born-digital tables.
- Upload the PDF
Drop the file into the upload area. The converter scans every page for table layouts and reports how many tables it found before doing the full extraction.
- Choose single-sheet or multi-sheet output
By default, each detected table lands on its own worksheet, named by page number. Switch to 'concatenate similar tables' if your PDF is a multi-page table that should be one long worksheet instead of ten short ones.
- Open the XLSX and clean up
Open the output in Excel or Google Sheets. Expect 5–15% of the cells to need manual review — merged-cell heuristics, currency parsing, and header detection all have edge cases. Once verified, the data is ready to pivot, filter, or formula-edit.
Common use cases
- Financial statements and reports
Pull balance-sheet or income-statement data out of a published PDF report into Excel for your own analysis or model.
- Bank and credit-card statements
Convert monthly statement tables to XLSX for bookkeeping, expense tracking, or importing into accounting software.
- Price lists and catalogs
Extract supplier price lists delivered as PDFs into a spreadsheet you can compare, sort, or cross-reference with other data.
- Research data tables
Convert tables from academic papers or government reports into XLSX so you can run your own analysis instead of re-typing the numbers.
Privacy & security
Extraction runs on isolated workers. The converter reads text positions and cell values from the PDF only to build the output XLSX — no financial data, personal information, or table contents are retained beyond the job. Both the uploaded PDF and the generated Excel file are removed from our servers shortly after your download completes. If the source contains personal or financial records (bank statements, tax documents), download promptly and remove any local temporary copies.
Frequently asked questions
How accurate is the table extraction?
Very good for PDFs with visible gridlines and consistent row heights — expect 90–99% cell-level accuracy. Lower for whitespace-aligned tables without borders, merged-cell layouts, or multi-line rows. Always spot-check the XLSX before trusting it in downstream analysis.
Can I extract tables from a scanned PDF?
Only after running OCR first. Scans have no structural text data for the converter to work with. Even then, table geometry from OCR is approximate — budget more cleanup time in Excel.
What if the PDF has multi-page tables?
Enable 'concatenate similar tables' in the conversion options. The tool detects repeating column headers across pages and merges them into one continuous worksheet instead of a sheet per page.
Are numbers and dates typed correctly in the output?
The tool detects numeric and date-formatted cells and writes them as real typed values, so you can sum, average, or sort without re-formatting. Locale-specific date formats (DD/MM/YYYY vs MM/DD/YYYY) may need manual correction.
What about tables with merged cells or nested subtotals?
Merged cells are unmerged by default (the value lands in the top-left cell). Subtotal rows come through as regular data rows — you may want to replace them with formulas after extraction to keep the spreadsheet live.
Is the file deleted after extraction?
Yes. Both the uploaded PDF and the generated XLSX are removed from our servers shortly after your download completes.