OCR: turn scans into a searchable PDF
You’ve got a scanned PDF, you want to search a keyword or copy a line, and nothing will highlight — because to the computer it’s just an image. OCR (optical character recognition) reads the text inside that image and lays an invisible text layer over the original scan: it looks exactly the same, but now it’s searchable, selectable and copyable. compress cat recognizes Chinese and English with a local WASM engine — nothing is uploaded, and recognition runs entirely on your device. Accuracy tracks scan quality: clean print is best; complex layouts, tables, handwriting and blurry low-quality scans are where it struggles.
You’ve got a scanned PDF, you want to search a keyword or copy a line, and nothing will highlight — because to the computer it’s just an image.
The language data downloads on first use (~20MB for Chinese, cached by your browser). Recognition runs entirely on your device — nothing is uploaded. More pages take longer; please be patient.
How to ocr (searchable pdf)
- 1Select a scanned PDF file.
- 2Choose the recognition language (Chinese + English, or English only for speed).
- 3Click “Make searchable PDF” and wait for page-by-page recognition (the language data downloads on first use).
- 4Download the result — the new PDF lets you search and copy its text.
Why use compress cat's OCR (searchable PDF)?
- It adds an invisible text layer and leaves the original image untouched: the page looks identical to the scan, but Ctrl+F finds words and you can copy whole passages — it doesn’t reflow your page into a new document.
- The whole recognition runs in your browser via the tesseract engine, so medical records, IDs and contracts never pass through a server.
- It reads Simplified Chinese and English, covering most office scans; switch to “English only” for English-only documents and it runs faster.
Frequently asked questions
Clean printed text (scanned or photographed) gives the best accuracy. Complex multi-column layouts, tables, handwriting, and blurry, skewed or too-dark scans noticeably lower it — that’s the limit of in-browser OCR, so proofread the result.
The first run downloads the language data (~20MB for Chinese), which your browser then caches, so later runs are quick. Overall speed also depends on page count and your device.
Yes. It defaults to Simplified Chinese + English; if the document is English-only, switch to “English only” for more speed.
Yes. OCR produces a searchable PDF with a text layer, which you can then run through compress cat’s PDF to Word to get an editable .docx.
Turn it into an automated flow
Need to batch-process, or chain several steps? Use the workflow builder to combine compress, merge, rotate and watermark into a reusable pipeline.
Updated · compress cat team