Processed in your browser · no upload

OCR: turn scans into a searchable PDF

You’ve got a scanned PDF, you want to search a keyword or copy a line, and nothing will highlight — because to the computer it’s just an image. OCR (optical character recognition) reads the text inside that image and lays an invisible text layer over the original scan: it looks exactly the same, but now it’s searchable, selectable and copyable. compress cat recognizes Chinese and English with a local WASM engine — nothing is uploaded, and recognition runs entirely on your device. Accuracy tracks scan quality: clean print is best; complex layouts, tables, handwriting and blurry low-quality scans are where it struggles.

You’ve got a scanned PDF, you want to search a keyword or copy a line, and nothing will highlight — because to the computer it’s just an image.

Drag & drop or click to select filesFiles are processed locally in your browser, never uploaded

Files are processed in your browser — never uploadedRecognition language

The language data downloads on first use (~20MB for Chinese, cached by your browser). Recognition runs entirely on your device — nothing is uploaded. More pages take longer; please be patient.

How to ocr (searchable pdf)

1Select a scanned PDF file.
2Choose the recognition language (Chinese + English, or English only for speed).
3Click “Make searchable PDF” and wait for page-by-page recognition (the language data downloads on first use).
4Download the result — the new PDF lets you search and copy its text.

Why use compress cat's OCR (searchable PDF)?

It adds an invisible text layer and leaves the original image untouched: the page looks identical to the scan, but Ctrl+F finds words and you can copy whole passages — it doesn’t reflow your page into a new document.
The whole recognition runs in your browser via the tesseract engine, so medical records, IDs and contracts never pass through a server.
It reads Simplified Chinese and English, covering most office scans; switch to “English only” for English-only documents and it runs faster.

Frequently asked questions

Clean printed text (scanned or photographed) gives the best accuracy. Complex multi-column layouts, tables, handwriting, and blurry, skewed or too-dark scans noticeably lower it — that’s the limit of in-browser OCR, so proofread the result.

The first run downloads the language data (~20MB for Chinese), which your browser then caches, so later runs are quick. Overall speed also depends on page count and your device.

Yes. It defaults to Simplified Chinese + English; if the document is English-only, switch to “English only” for more speed.

Yes. OCR produces a searchable PDF with a text layer, which you can then run through compress cat’s PDF to Word to get an editable .docx.

Turn it into an automated flow

Need to batch-process, or chain several steps? Use the workflow builder to combine compress, merge, rotate and watermark into a reusable pipeline.

Updated 2026-06-04 · compress cat team

OCR: turn scans into a searchable PDF

Why use compress cat's OCR (searchable PDF)?

It adds an invisible text layer and leaves the original image untouched: the page looks identical to the scan, but Ctrl+F finds words and you can copy whole passages — it doesn’t reflow your page into a new document.

The whole recognition runs in your browser via the tesseract engine, so medical records, IDs and contracts never pass through a server.

It reads Simplified Chinese and English, covering most office scans; switch to “English only” for English-only documents and it runs faster.

Frequently asked questions

The first run downloads the language data (~20MB for Chinese), which your browser then caches, so later runs are quick. Overall speed also depends on page count and your device.

Yes. It defaults to Simplified Chinese + English; if the document is English-only, switch to “English only” for more speed.

Yes. OCR produces a searchable PDF with a text layer, which you can then run through compress cat’s PDF to Word to get an editable .docx.