PDF Compression Methodology
Most 'how to compress a PDF' guides just tell you which button to press, and skip the one thing that matters most: how small a file can get is decided by its content, not by the tool. This methodology starts from where a PDF's size truly comes from, breaks down how much each technique (down-sampling, JPEG quantization, grayscale, de-duplication) saves, and then gives you a decision tree for choosing a path given the file in front of you. By the end you'll stop guessing settings and start judging first — knowing which files compress in one pass and which will never hit your target. Everything runs locally in your browser; files are never uploaded.

Where does a PDF's size actually come from?
To compress something, you first have to know what you're compressing. A PDF's size breaks down into roughly four parts: page images (the bitmap data of scans, photos and illustrations), vectors and text (glyphs, paths, table rules), embedded fonts (full or subsetted font files), and structural overhead (metadata, bookmarks, annotations, thumbnail caches, duplicate objects). For nearly every 'too big to upload' file, the first part — page images — is over 80% of the size.
That's why the same 10 pages can be 300KB as a Word export and 30MB as a phone-photographed contract: the former is vector text, the latter is ten multi-megapixel color photos. The first step in judging how small a file can get is always the same question — is its size mostly images, or mostly text?
| Size source | Typical share | Headroom | Technique |
|---|---|---|---|
| Page images (scans / photos) | ~80–98% of image PDFs | Very large | Down-sample + JPEG + grayscale |
| Vectors & text | Bulk of a text PDF | Tiny | Font subsetting, de-dup |
| Embedded fonts | Tens of KB to a few MB | Medium (subset) | Subset / drop unused glyphs |
| Structural overhead | Usually < 5% | Small (lossless) | Strip redundant objects |
The four techniques — how much does each save?
'Compressing a PDF' is really a mix of independent techniques. Understanding each one's cost and payoff is what explains why some files shrink instantly and others won't budge.
The first three are lossy — they trade quality for size, with huge payoff; the last is lossless — it only strips redundancy, so quality is untouched but the savings are modest. Hitting a hard target like 'exactly 200KB' almost always means using some combination of the first three.
- Image down-sampling (lower DPI): dropping a 600DPI scan to 150DPI cuts pixel count to ~1/16 — the single biggest saving. The cost is softness when zoomed in, but 150–200DPI is plenty for screen reading and most reviews.
- JPEG quantization (higher ratio): a more aggressive JPEG quality on page images shrinks size fast. The cost is faint 'mosquito' noise around text edges, visible below roughly quality 50.
- Grayscale / black-and-white: a black-and-white text scan doesn't need color; dropping the color channels usually saves another 30–60%. The cost is losing color (no loss for pure text).
- Lossless de-duplication: remove duplicate objects, unused fonts, thumbnail caches and metadata. Quality is untouched, but it typically saves only 5–25% — not enough on its own to hit a hard target.
How small can different content get? A reference table
This is the one table worth bookmarking. It's not a promise — it's an experience-based range by content type. The same '200KB target' is trivial for a text scan and nearly impossible for a 50-page color brochure. Match your file to a row first, and you can predict the outcome.
| Content type | Typical size before | Compresses to | ≤200KB? |
|---|---|---|---|
| B&W text scan (200–300DPI) | 0.5–2 MB / page | 20–80 KB / page | Usually yes |
| Color document scan (300DPI) | 1–4 MB / page | 80–300 KB / page | Single page often; multi-page hard |
| Phone photo → PDF | 2–6 MB / page | 100–400 KB / page | Needs lower DPI + grayscale |
| Text PDF (Word export) | Already small | Almost no headroom | Already fits — no need |
| Image-heavy brochure / portfolio | 5–20 MB+ | 1–5 MB | Practically impossible |
The decision tree: which path for the file in your hands?
Here's the methodology distilled into an executable judgment. Start from 'what kind of file is this', follow the branch, and you'll land on the least-effort, least-blurry strategy.
- Pure text (Word/Docs export, no scanned images)Usually already < 1MB — no need to compress. If you must, keep the original PDF rather than rasterizing, so text stays selectable.
- Black-and-white text scanGrayscale + down-sample to 150–200DPI, hit the 200KB preset — usually one pass does it.
- Color scan / photo, one or a few pagesSet the target (by upload limit), try keeping color first; if it misses, switch to grayscale or drop a DPI step.
- Dozens of pages, needs to be a few hundred KBTrim pages first with Delete Pages / Split PDF, then compress — page count is a hard floor you can't compress past.
- Image-heavy brochure / portfolioCompress the source images then rebuild the PDF, or accept a larger target (e.g. ≤10MB); over-compressing destroys the artwork.
Why a file sometimes 'won't compress'
'It won't get smaller' is almost never a broken tool — it's a physical floor: the file is near its information-density limit, and squeezing further just trades clarity for size at a terrible rate. The value of methodology is recognizing this early instead of clicking the button over and over.
- The images are already heavily compressed: a low-quality source JPEG has no redundancy left, so you only lose quality. Loosen the target.
- Pages × per-page floor > target: 100 scanned pages can't fit in 200KB if each needs at least 20KB. Reduce pages first.
- The target is too aggressive: crushing a crisp color scan to 30KB makes it unusable. Try 200KB, then tighten in steps.
- It's actually a text PDF: already small with almost no headroom — 'won't shrink' is normal, because it never needed compressing.
Frequently asked questions
Why is a scan dozens of times bigger than a Word export at the same page count?
Because a scan is a high-resolution bitmap per page (image data), while a Word export is vector text that barely adds size. The whole gap comes from 'images vs. text', which is also why the scan compresses dramatically and the text PDF has almost no headroom.
Which matters more for size — lower DPI or higher JPEG ratio?
Usually lower DPI: halving DPI cuts pixel count to a quarter, a near-squared drop in size, while JPEG quality is a more linear trade-off on a fixed pixel grid. In practice you use both, and compress cat's 'compress to a target size' balances them for you.
Can lossless compression get me to 200KB?
Almost never. Lossless only strips redundancy (metadata, duplicate objects) and typically saves 5–25% — nowhere near enough to take a multi-megabyte scan to 200KB. Hitting a hard target needs lossy (down-sampling / JPEG / grayscale).
How do I tell whether my file can reach the target?
Use the reference table: a B&W text scan to 200KB is usually fine, a multi-page color scan is harder, an image-heavy portfolio is practically impossible. Check the content type first to decide if your target is realistic — it saves a lot of wasted attempts.
Updated · compress cat team