security

How to redact a PDF so the hidden text actually stays hidden

JAJonas Albrecht7 min read
Arthize guide cover — how to redact a PDF

The short version

  • A black box over text does not remove it — the words stay selectable and copyable underneath.
  • Real redaction deletes the underlying content and flattens (rasterizes) the affected pages.
  • Always verify: try to select and search for the removed text; zero results means it worked.
  • Scrub metadata and hidden objects too — a revealing filename or author tag leaks just as badly.

If you take one thing from this entire blog, make it this: drawing a black rectangle over text does not delete the text. The words are still in the file, sitting under the box, fully selectable and copyable by anyone who knows to try. This isn't a hypothetical — it's how court filings, government reports, and corporate documents have leaked names, salaries, and trade secrets for two decades. Redaction done right is permanent removal, and it's worth understanding exactly why.

Why the black box fails

A PDF stores text as text and graphics as separate objects layered on top. When you "redact" by adding a black shape in most editors, you've added a graphic above the text layer — like laying a sticky note on a printout, except the printout is still under it. Three trivial moves recover the hidden words:

  • Select the page text with the cursor and copy it — the covered words come along.
  • Open the file in another viewer that renders layers differently.
  • Run text extraction, which reads the underlying text stream and ignores the box entirely.

What real redaction does

Proper redaction doesn't cover content — it removes it. A correct tool deletes the actual text and image data in the redacted region from the file's content streams, and typically rasterizes the affected pages (flattening them to images) as a belt-and-suspenders step so nothing recoverable is left behind. After real redaction, copying the page returns nothing where the sensitive text used to be, because it genuinely isn't there anymore.

How to redact a PDF properly

  1. Open the redaction tool and load the document.
  2. Mark every region to remove — names, account numbers, signatures, that one email address in a footer you almost missed.
  3. Apply the redaction so the content is removed and the pages flattened, not just covered.
  4. Verify. Open the output, try to select text where the redactions are, and run a text search for a name you removed. Zero results is the goal.

Don't forget the parts you can't see

Visible text is only half the leak. PDFs carry metadata (author name, software, original filename), and sometimes hidden layers, comments, or attached files. A document titled "Smith_termination_FINAL.pdf" in its metadata gives the game away even if the body is spotless. Real redaction includes scrubbing metadata and removing hidden objects, not just blacking out the page.

Redaction vs. encryption — different jobs

People reach for a password when they mean redaction and vice versa. Encryption controls who can open the file — everyone with the password still sees all of it. Redaction controls what's inside — the sensitive content is gone for everyone, password or not. A redacted court exhibit needs to be public and have certain names removed; only redaction does that.

The cruel irony of free online redaction

Think about what a "redact PDF online" site needs to function: you upload the document with all the sensitive content still in it, their server processes it, and hands back a redacted copy. You've just sent the unredacted secrets to a third party to ask them to hide your secrets. For genuinely sensitive material, that's the worst possible workflow. We run redaction inside the same private workspace as the rest of Arthize so the unredacted file never leaves your account. The broader case for that is in what happens when you upload a PDF to a free tool.

Redaction is the highest-stakes item in the PDF workflow guide. Get it right and the rest is housekeeping.

Frequently asked

Why isn't drawing a black box over text real redaction?
The box is a graphic layered on top of the text; the text itself stays in the file. Anyone can select and copy it, open the file in another viewer, or extract the text stream to recover the 'hidden' words.
How do I redact a PDF so the text can't be recovered?
Use a tool that removes the underlying content from the page's content streams and flattens (rasterizes) the affected pages, then verify by trying to select and search for the removed text. Also scrub the document's metadata.
Is redacting a PDF the same as password-protecting it?
No. A password controls who can open the file; everyone with it sees all the content. Redaction permanently removes specific content so no one can read it, regardless of access.
JA

Jonas Albrecht

Co-founder & security lead, Arthize

Jonas started Arthize with Maya after one too many contracts got uploaded to free PDF sites at his old job. He focuses on the parts of a document people assume are safe and usually aren't — encryption, true redaction, and what servers quietly keep.

Keep reading

Try the tool, not just the theory.

Every technique in this guide runs inside one private Arthize workspace — no upload sites, no trackers.

Start free →