pillar

Working with PDFs in 2026: the practical, no-nonsense guide

MSMaya Sundaram9 min read
Arthize guide cover — working with PDFs in 2026

The short version

  • Most PDF work is just six operations: organize, convert, optimize, and three flavours of security.
  • Conversion quality depends on how the PDF was made — a scan needs OCR first or you get garbage.
  • A black box is not redaction, and a 'protected' PDF can mean two very different things.
  • Every task sends your document to a server; the only real question is whose, and what it keeps.

Most people don't have a "PDF problem." They have a Tuesday where a lease needs three pages pulled out, a scanned receipt won't paste into a spreadsheet, and a 40 MB brochure bounces off a mailbox that caps attachments at 25. Each task is small. The annoying part is that the internet wants you to download a different app, or upload your file to a different stranger, for every single one.

This is the map we wish we'd had. It walks through the handful of operations that cover roughly 90% of real PDF work, what's actually happening under the hood when you run them, and the mistakes that quietly cost people money or privacy. Where a task deserves its own deep dive, we link to one. Read top to bottom or jump to the part that's ruining your afternoon.

Why the PDF refuses to die

The PDF turned 32 this year and it is still the only document format that looks identical on a 2008 ThinkPad and a brand-new phone. That's the whole point of it: a PDF freezes layout, fonts, and vector art into one file so the thing you sent is the thing they see. Word documents reflow. Web pages break. PDFs don't. That reliability is exactly why courts, banks, and HR departments never let go of it — and why you keep having to wrangle them.

The trade-off is that a frozen format is a pain to edit. You're not changing a living document; you're performing surgery on a snapshot. Knowing which "surgery" each tool actually does is most of the battle.

Organizing: merge, split, reorder

The most common request we see is also the most mundane: take these files and make them one, or take this file and make it several. Merging stitches multiple PDFs (or images) into a single document in the order you choose. Splitting does the reverse — by page ranges, by file size, or every N pages.

  • Merging PDFs is how you turn a signed contract, an addendum, and a scanned ID into one clean packet instead of three attachments nobody opens in order.
  • Splitting and extracting pages is for when someone needs page 4, not your entire 90-page report — and for chopping a fat scan into per-invoice files.

Converting: in and out of PDF

Conversion is where expectations and reality collide hardest. A PDF born from a Word file converts back to Word beautifully. A PDF that's really a photo of a printout converts to a Word file full of garbage — unless you OCR it first. The format hasn't changed; the contents have.

  • PDF to Word is the request that breaks the most hearts, because "keeping the formatting" depends entirely on how the PDF was made.
  • OCR is the step that turns a scan — pixels that merely look like words — into text you can select, search, and copy.

Optimizing: compress without wrecking it

A 40 MB PDF is almost always a few high-resolution images in a trench coat. Good compression down-samples those images and re-encodes the file structure; bad compression rasterizes your crisp text into a blurry JPEG. The difference is whether you can still select the words afterward.

We wrote a whole piece on compressing a PDF without turning text into mush, because "make it smaller" and "keep it readable" are in constant tension and the defaults rarely get the balance right.

Security: lock it, redact it, sign it

This is the category where mistakes don't just look bad — they leak. Three operations get confused constantly, and they do completely different jobs:

Passwords and encryption

A password on a PDF is real cryptography (AES, in any modern tool), not a "please don't peek" sticker. But there are two kinds — one stops people opening the file, one stops them editing or printing it — and people pick the wrong one all the time. Here's how to password-protect and encrypt a PDF properly.

Redaction

Drawing a black box over a Social Security number does nothing. The text is still under the rectangle, fully selectable and copyable. Real redaction removes the underlying content — and it's the single most expensive PDF mistake organizations keep making.

Signatures

You can sign a PDF without printing, scanning, or owning a fax machine from the Clinton administration. There's a difference between a drawn signature image and a cryptographic digital signature, and we explain when each one matters.

Putting it together: a real workflow

Say a client sends you a 30-page scanned agreement and asks for "just the signature pages, cleaned up, small enough to email." Here's the actual order of operations:

Extract the pages you need (split). Run OCR so the text is searchable, not just a picture. Redact the bits that shouldn't travel. Compress so it clears the mailbox. Optionally lock it with a password before it leaves your hands. Five tools, one file, ten minutes — and at no point should your client's agreement have been parked on a stranger's server with a vague retention policy.

That's the whole philosophy behind Arthize: every tool above lives in one private workspace, so you stop bouncing a confidential document between tabs. Start with the deep dive that matches today's headache, and bookmark this page for the next one.

MS

Maya Sundaram

Co-founder & document-tooling engineer, Arthize

Maya has spent the last decade building document-processing systems — first for a legal-tech startup that ingested millions of scanned filings, now at Arthize where she owns the conversion, OCR and compression pipelines. She has opinions about Ghostscript flags.

Keep reading

Try the tool, not just the theory.

Every technique in this guide runs inside one private Arthize workspace — no upload sites, no trackers.

Start free →