How to Convert a Scanned PDF Invoice to Excel (With or Without OCR)
If you have a stack of scanned invoices and need them in Excel, you already know the real problem isn't "can it be done" — it's "how do I get clean rows and columns without spending an afternoon fixing them."
This guide walks through the three methods that actually work in 2026, explains why scanned PDFs are harder than digital ones, and gives you a checklist for getting accuracy close to 99% on real-world invoices.
Why scanned PDFs are different from regular PDFs
A digital PDF (say, one you exported from Word) contains real, selectable text. You can highlight it with your cursor, copy it, and paste it into Excel.
A scanned PDF is just a picture of a page saved in PDF format. There is no text inside it — only pixels. To turn those pixels into data you can put in Excel, something has to read the image. That "something" is OCR (optical character recognition).
Here's the quick test: open the PDF, try to highlight a word. If you can't, you have a scanned PDF and plain copy-paste won't work.
Method 1 — Use an AI invoice tool (fastest, most accurate)
This is the right method for almost everyone. A modern invoice-to-Excel tool does three things at once:
- Runs OCR to read the text.
- Detects the invoice layout (vendor, date, totals, line items).
- Outputs a structured Excel file with one row per line item.
- Upload the scanned PDF to a tool like Albills.
- Wait 5–20 seconds while OCR and layout detection run.
- Download the
.xlsx. You'll get a header sheet (vendor, invoice #, totals) and a line-items sheet.
Cons: Free tiers usually have daily limits. For large volumes, you'll want a paid plan or the API.
Accuracy tip: The cleaner your scan, the better. Scan at 300 DPI in grayscale mode. If you're photographing a paper invoice with your phone, hold the camera flat and use natural light — no flash.
Skip the setup — try the tool
Albills handles scanned PDFs out of the box. OCR + table structure + line items auto-detected.
Open Scanned PDF to Excel →Method 2 — Adobe Acrobat Pro (if you already have it)
Adobe Acrobat Pro has a built-in "Export to Excel" feature that includes basic OCR.
Steps:
- Open the scanned PDF in Acrobat Pro.
- Go to
Tools → Scan & OCR → Recognize Text → In This File. - After OCR completes, go to
Export PDF → Spreadsheet → Microsoft Excel Workbook. - Save the
.xlsx.
Cons: Acrobat's table detection is okay for clean layouts and mediocre on complex invoices. Line items often get merged into fewer columns than they should. Expect 10–30 minutes of manual cleanup per invoice on anything other than a simple layout.
When to use: Occasional scans of simple invoices, or when you absolutely can't use a cloud tool for compliance reasons.
Method 3 — The free-but-manual route (Tesseract + a spreadsheet)
If you're technical and want a $0 solution, the open-source path works:
- Install Tesseract — the open-source OCR engine.
- Convert the scanned PDF to images with
pdftoppmat 300 DPI. - Run Tesseract on each page:
tesseract page.png page -l eng. - Paste the output into Excel and manually clean it into columns.
Cons: Tesseract outputs plain text. There is no table structure — every invoice needs manual column reconstruction. On a 3-line-item invoice this takes 5 minutes. On a 50-line-item invoice it takes an hour.
When to use: One-off scans where privacy is paramount and you have time to spare.
The OCR pitfalls no one warns you about
After running OCR on thousands of invoices, these are the failure modes that bite most often:
- Commas vs. periods. In European invoices,
1.234,56means one thousand two hundred thirty-four and 56 cents. In US invoices,1,234.56means the same. OCR doesn't know which you meant — set the locale explicitly. - O vs. 0, I vs. 1, S vs. 5. Low-resolution scans confuse these constantly. Post-OCR validation (is this field numeric?) catches most errors.
- Rotated pages. A page scanned sideways produces garbage unless the tool auto-rotates. Always use a tool that deskews.
- Faint carbon copies and thermal receipts. Contrast enhancement before OCR helps. Some tools do this automatically; Tesseract doesn't.
- Line items that wrap across two visual rows. Human-readable but OCR often treats them as separate items. Post-process to stitch by matching amounts.
Accuracy checklist (before you upload anything)
Use this checklist to get OCR accuracy as high as possible:
- [ ] Scan is 300 DPI or better.
- [ ] Scan mode is grayscale or black-and-white, not color.
- [ ] Page is flat (no book-spine curl).
- [ ] Document is straight (or will be auto-deskewed).
- [ ] Lighting is even (for phone photos).
- [ ] File is a standard PDF (not an encrypted or password-protected one).
- [ ] Language is known and set in the tool.
Which method should you actually use?
| Your situation | Best method |
|---|---|
| One-off invoice, want it done now | AI invoice tool (free tier) |
| A folder of 20+ invoices per month | AI invoice tool, Pro plan |
| Building this into another app | AI tool's API |
| Already paying for Adobe, 1–2 scans/month | Adobe Acrobat Pro |
| Privacy requires offline processing | Tesseract + manual cleanup |
Try Albills on your scanned invoice
If you want to see how accurate modern invoice OCR has become, upload a scanned invoice to Albills' scanned-PDF-to-Excel tool — it's free for up to 5 files per day, no signup. You'll get a clean Excel with vendor, date, totals, and every line item already populated.
*Want to automate this across a team? See our [bulk PDF invoice converter](https://albills.com/bulk-pdf-to-excel) or the [Albills API](https://albills.com/api) for developers.*
Try Albills free — right now
Convert up to 5 invoices per day for free. No signup. OCR included. Pro unlocks unlimited files + batch upload for $9/month.
Try free →Some links above may be affiliate links. When you sign up through them, Albills may receive a referral fee at no cost to you. We only recommend tools we would actually use ourselves.