Skip to main content

Permanently redact a PDF — without leaving the text underneath

The kind of redaction that survives a copy-paste test. Open the file on your Mac or Windows desktop, draw a rectangle, and FileHop destroys the text glyphs, image pixels, and vector paths inside it — then re-walks the output to confirm nothing redactable survived. If verification fails, the file is deleted before it can be saved. No upload. No black boxes drawn on top of live text.

How redactions fail (and why famous ones still happen)

Almost every public redaction disaster is the same mistake: a black rectangle drawn on top of a PDF — using a markup tool, a highlight tool, or a draw-shape tool — saved, and filed. The black box is a graphical annotation. The text underneath is untouched. Anyone who opens the PDF and selects the redacted region with the cursor, then copies and pastes into a text editor, gets the original text back.

Cautionary case · 2019

Manafort court filing — black boxes drawn on top of live text

Court filing in the Paul Manafort matter contained passages 'redacted' as black rectangles drawn over text. The text layer was untouched. Within hours, readers were copy-pasting the polling-data passages out of the PDF. ABA Journal coverage, January 2019.

Cautionary case · 2014

New York Times — NSA agent name leaked through a 'redacted' PDF

As part of its Snowden-documents reporting, the NYT published a PDF with a name and program target 'redacted' by black overlay. The text was extractable from the file the same day. Techdirt coverage, January 2014.

The pattern is identical every time. The lawyer or editor uses the wrong tool — a draw-shape or highlight in Acrobat, a black rectangle in Preview, a graphic overlay in Word — instead of the dedicated Redact tool. The visible page looks fine. The underlying file is unchanged.

There is a second, more subtle failure mode that academic researchers documented in 2023: even some redactors that DO remove the underlying glyphs leak the redacted text's WIDTH as a side channel. The standard PDF text operators (Tj for a single string, TJ for a string-with-glyph-positioning) carry per-glyph advance widths; if a redactor replaces the redacted run with a single shift of the same width, the residual width can be cross-referenced against font metrics to reconstruct the redacted text. Most consumer redactors do not address this. The reference is Bland, "Story Beyond the Eye: Glyph Positions Break PDF Text Redaction," Proceedings on Privacy Enhancing Technologies (PETS) 2023.

What 'real' redaction means (in PDF terms, in plain English)

A PDF page is not a picture. It is a small program that tells a PDF reader how to draw the page: set font to Times, move to position X/Y, show this string of glyphs (Tj), apply this clip path, draw this image at this scale. The visible page is the OUTPUT of running that program. The text underneath every black box you see on screen lives in that program as a Tj or TJ operator carrying the actual character codes.

An overlay redaction adds a new instruction to the program: AFTER you draw the text, draw this black rectangle on top of it. The text-showing operator is still there. Anyone who runs the program — including any tool that extracts text without rendering — gets the original characters back.

Real redaction rewrites the program. The text-showing operator is found, the characters inside the redaction rectangle are deleted, and the operator is rewritten with the surviving (non-redacted) characters. If an entire run is inside a redaction, the operator is removed. If half the run is inside, the operator is split. Image pixels inside the redaction are painted to solid black in the underlying image samples — not just covered. Vector paths fully inside the redaction are dropped from the content stream.

FileHop's redactor goes one step further. After the destructive pass, it re-opens the output file and walks it from scratch — same parser, same code path — looking for any glyph, inline image, or non-black image pixel that survived under any redaction rectangle. If anything survives, the function fails closed: it returns an error AND deletes the output file before the user can save it. The on-disk source file is never modified; it stays where it was while a new file is written.

“Destructive PDF redaction: permanently removes text glyphs, image pixels, contained vector paths, and inline images inside redaction regions, then re-walks the output to confirm nothing redactable survives. Fails closed when content cannot be redacted faithfully.”
Source comment — services/pdf/redactor.rs

The glyph-position side channel — and FileHop's structural answer

There is one additional protection against the glyph-positioning side-channel attack documented in Bland 2023. When a redacted text run is rewritten, its total advance width is collapsed to one number AND quantized to a 500-unit grid (the REDACTION_TJ_QUANTUM constant in the source). The comment in the code reads: "Collapsing a run to one number already defeats per-glyph recovery; quantizing the total coarsens the residual length side channel so the exact width of the removed text cannot be measured from the stream." This addresses the Bland attack directly.

In plain English: most consumer redactors that DO remove glyphs still leave behind a number that says, in effect, "insert this much horizontal space here." That number is the width of the redacted text in font units. With the font and the width, an attacker can often guess the original word. FileHop collapses the whole redacted run to a single number and rounds that number to a coarse grid, so the residual width is no longer a faithful measurement of what used to be there.

What FRCP Rule 5.2 says must be redacted from a federal court filing

Federal Rule of Civil Procedure 5.2 — Privacy Protection For Filings Made with the Court — sets the floor for what must be redacted from a federal court filing. The same categories show up across CM/ECF e-filing rules in nearly every federal district. The Rule's text is plain: it lists five categories with specific truncation rules. The clerk does NOT review filings for compliance — the responsibility is on the filer.

  1. 1

    Social Security numbers and Taxpayer ID numbers

    Only the last four digits may appear in the filing.

  2. 2

    Birth dates

    Only the year of birth may appear.

  3. 3

    Names of minor children

    Only the initials may appear.

  4. 4

    Financial account numbers

    Only the last four digits may appear.

  5. 5

    Home addresses (in criminal cases)

    Only the city and state may appear, by court rule in many criminal-case contexts; check local court rules.

State courts vary. Many follow the federal Rule's structure; some add categories (medical record numbers in some jurisdictions; immigration status in others). Always check the specific court's local rules. Source: Federal Rules of Civil Procedure Rule 5.2 (Cornell Legal Information Institute). For criminal cases the parallel rule is FRCP 49.1 with its own variations.

The FileHop redaction workflow (locally, in 4 minutes)

All four steps run inside the FileHop desktop app on your computer. Your PDF does not transit our servers at any point during this workflow. Mac and Windows.

  1. 1

    Step 1: Open the PDF in FileHop

    Drag the PDF into the FileHop app or use File → Open. If the PDF is password-protected, FileHop will prompt for the password — the file is unlocked in-memory only, not modified on disk. The original file stays exactly where it is on your drive throughout this workflow; FileHop writes a new file at save time, never overwriting the source.

  2. 2

    Step 2: Mark what to redact — by drawing, or by text search

    You have two ways to mark redaction regions. (a) Draw: select the Redact tool and drag a rectangle over each region you want destroyed. This is the manual mode; use it for arbitrary content (a paragraph, a name, a section of a deposition exhibit). (b) Search-and-mark: type a string (a specific name, account number, address) and FileHop finds every occurrence in the PDF and queues a redaction mark at each match. Searches are case-insensitive. This is faster for repeated identifiers, but it requires the PDF to have a text layer — for a scanned PDF with no text layer, you have to draw the rectangles manually (or OCR the file first, which currently requires opting into FileHop's cloud OCR; if OCR's cloud posture is not acceptable for this document, OCR it upstream and bring the searchable PDF back). What FileHop does NOT do at this step is pattern-based AI auto-detect — FileHop will not 'find all SSNs' or 'find all DOBs' for you. If you need pattern-detection auto-redaction, the right tools are CaseGuard or redactor.ai (both SaaS — accept the upload posture). FileHop's redaction is search-or-draw, not detect.

  3. 3

    Step 3: Apply the redactions

    When every region is marked, click Apply Redactions. FileHop rewrites the page content streams: each Tj/TJ text operator that falls under a redaction rectangle is rewritten with the redacted glyphs removed; the residual run's total advance width is collapsed to a single quantized number (500-unit grid) so the side-channel attack from Bland 2023 does not leak the original width; image XObjects under any redaction rectangle have their underlying RGB samples painted to solid black; inline images entirely inside a redaction are dropped from the operator stream; vector paths fully inside a redaction are dropped. Form XObjects (used by multi-page templates and stamped content) are recursed into and their content streams are rewritten the same way. If the page uses a Type3 font (rare, but it happens in some old or specialized PDFs), FileHop aborts the redaction and returns an error — better to fail loudly than to falsely report a partial redaction.

  4. 4

    Step 4: Save (and let FileHop verify automatically)

    Choose Save As and pick an output path. FileHop writes the new PDF, then automatically re-opens that output file from scratch and re-walks every page that had a redaction. If any text glyph survives under any redaction rectangle, OR any inline image survives, OR any pixel in a painted image is not solid black inside the painted region, the function returns an error AND deletes the output file before you can use it. You will see an error in the app and no redacted file will be saved. The verify_redaction function in services/pdf/redactor.rs returns one of three error strings depending on what survived: "text is still present under a redaction", "an inline image is still present under a redaction", or "image content is still present under a redaction." The metadata Info dictionary is also sanitized in the same pass — the redacted output does not need a separate metadata-strip step.

Verify it actually worked (60-second checklist — works for any redactor)

FileHop's automatic re-walk verification gives you fail-closed assurance for the FileHop path. The habits below are tool-agnostic and work on a redacted PDF from any source — Acrobat, FileHop, an online redactor, anything. Run them on every redacted file you are about to file or send, regardless of which tool you used. Three minutes; do it cold.

  1. 1 Copy-paste test. Open the redacted PDF in your default reader. Use Cmd+A (Mac) or Ctrl+A (Windows) to select all. Then paste into a plain-text editor (TextEdit on Mac, Notepad on Windows — NOT Word, which can hide structure). If any redacted text appears in the text editor, the redaction is overlay-only and the file is broken. Throw it away.
  2. 2 Save-as-text test. From your PDF reader, use File → Export As → Text (in Preview, Acrobat, or any modern reader that supports text export). Open the resulting .txt file. Search (Cmd+F / Ctrl+F) for the specific strings you redacted — the name, the SSN's middle digits, the account number's middle digits, the phrase. If any redacted string is in the .txt, the redaction is broken.
  3. 3 Second-reader test. Open the PDF in a DIFFERENT reader than the one you used to redact it (if you redacted in Acrobat, open in Preview or Edge; if you redacted in FileHop, open in Acrobat or Preview). Repeat the copy-paste test in the second reader. Different readers extract text via different paths; if one passes and another fails, you have an overlay-only redaction.
  4. 4 Search-the-file test (advanced). Open the PDF in a hex/binary viewer or use a command-line tool to dump the raw object stream. Search for the redacted strings literally. If they appear, the redaction is broken. This step is overkill for most filings but is the only definitive test if you suspect a side-channel issue.
  5. 5 Visual cold-open test. Close the file. Reopen it. Scroll every page. Confirm no leftover annotation comments, no draw-shape outlines, no whiteout, no highlighter residue on top of the page content. A real redaction will show as opaque black where the redactor intended; an overlay redaction sometimes shows the annotation handles when the redaction layer is selected.

What this workflow does NOT do (and where to go for those needs)

  • Not bar-ethics or court-certified compliance. FileHop does not certify that any specific redaction meets any specific jurisdiction's ethics or court-rule standard. The redaction is destructive at the PDF content-stream level, output is automatically re-verified before save, and the verification checklist above is the user-side belt-and-braces. The combination is engineering-defensible; whether it satisfies your court, your bar, your client, or your malpractice carrier is a question for you and your firm, not a question FileHop answers.
  • Not pattern-based AI auto-detection. FileHop redacts what you mark — by drawing rectangles or by text search for known strings. If you need 'find all SSNs / DOBs / account numbers / names in this thousand-page production' AI auto-detection, the right tools are CaseGuard, redactor.ai, iDox, or similar. Those tools are SaaS (upload-based) — accept the privacy posture trade if you need the AI workflow. FileHop's lane is the careful, surgical, local redaction of a specific document a lawyer is about to file or send.
  • Not for redacting signed PDFs and keeping the signature. A redacted PDF has a modified content stream by definition. An existing digital signature on the source will be invalidated by the redaction — which is the structurally correct behaviour (a signed document that has been modified should not still verify as the original signature). Sign AFTER redaction, not before. If you receive a signed PDF and need to redact it, the workflow is: verify the signature first and record the verification (screenshot, timestamp), then redact, then re-sign if your workflow requires a signed-and-redacted output.
  • Not for redaction with automatic exemption-code labelling. FOIA productions and some agency workflows require each redaction to carry a label naming the legal basis (b(6) personal privacy, b(7)(C) law enforcement personal privacy, etc.). FileHop does not auto-stamp these. You can draw an annotation/text overlay over the redaction with the exemption code, but it is a manual step. If you do high-volume FOIA production with exemption-code requirements, dedicated FOIA tooling (FOIAXpress, CaseGuard's exemption-log feature) is a better fit.
  • Not on iPad / Linux / web. FileHop runs on macOS and Windows desktop only. Many lawyers redact on iPad with Acrobat or PDF Expert; if that is your primary workflow, this article's tool recommendation doesn't apply but the verification checklist still does. Run the copy-paste / save-as-text / second-reader test on the iPad app's output anyway.
  • Encrypted PDFs need to be unlocked first. If the source PDF is password-protected, FileHop will prompt for the password on open and unlock the file in memory; if you don't have the password, redaction cannot proceed.

Why this workflow runs locally

The 'no upload' line in the headline is the wedge for this guide. Here is what it means in practice, with the limits stated honestly.

  • All four redaction steps run inside the FileHop desktop app on your computer. The PDF, the marked redaction rectangles, the search strings (if you used text-search-and-redact), and the output file all stay on your machine. Nothing transits our servers during the redaction itself.
  • No telemetry on file contents. We do not log which document you redacted, what strings you searched, or what regions you marked.
  • No AI training on your files.
  • Open output format. FileHop writes standard PDF that opens in any reader the receiving party uses — Acrobat, Preview, Edge, browsers, court e-filing systems.
  • The on-disk source PDF is never modified. FileHop writes a new file at save time, and the verification re-walk operates on that new file; if verification fails, the new file is deleted and the original is untouched.
  • Honest scope on cloud features: cloud OCR is opt-in and clearly labelled in the app. If you don't turn it on, no part of the file leaves your computer. OCR is only relevant to this workflow if you're redacting a SCANNED PDF with no text layer (and even then, only if you want search-and-redact rather than draw-the-rectangles).

FAQs

Does saving a PDF as a flattened image remove the text underneath my redactions?
Sometimes — depending on how the flatten was implemented. Some 'flatten' operations rasterize each page into a single image, which does remove the text layer (replaced by pixels) but also removes selectable text from the entire document, blows up file size, and harms accessibility. Other 'flatten' operations merge annotation layers into the page content but leave the text layer untouched — in which case the redaction still leaks. The safer answer is: don't rely on flattening as your redaction step. Use a destructive-redaction tool that rewrites the content stream, then verify with the copy-paste test.
Is drawing a black box on a PDF actually redaction?
No. A drawn black box is a graphical annotation — it sits on top of the page's content stream, which still contains the original text. Anyone can copy-paste through the black box to extract the text underneath. Every famous redaction failure (Manafort 2019, NYT 2014, the recurring FOIA leaks) is some version of this mistake. Use a dedicated Redact tool that destroys the underlying content, not a draw-shape tool.
How do I verify a PDF was properly redacted?
Run the five-step verification checklist above. The minimum-viable test is the copy-paste test: select-all in the redacted PDF, paste into TextEdit or Notepad, search for the redacted strings. If any redacted text appears, the redaction is broken. Don't use Word as the paste target — it can hide structure. The save-as-text test (File → Export As Text) is a strong second check, and the second-reader test (open the file in a different reader) catches reader-specific extraction differences.
What does FRCP 5.2 require to be redacted?
Five categories: Social Security numbers (only last 4 may appear), Taxpayer ID numbers (only last 4), birth dates (only year), names of minor children (only initials), financial account numbers (only last 4). Home addresses are redacted in criminal-case contexts by parallel rules (FRCP 49.1). The clerk does NOT review filings for compliance — responsibility is on the filer. Source: Cornell Legal Information Institute, Rule 5.2.
Can a redacted PDF be unredacted?
A properly redacted PDF — one where the underlying text operators were rewritten and the glyphs removed, then the output was verified — cannot be unredacted. The data does not exist in the file. A cosmetically 'redacted' PDF (black boxes drawn on top of live text) can be unredacted by anyone in under 60 seconds. Which kind you have depends entirely on which tool produced the redaction.
What is the glyph-positioning side channel?
An academic finding from Bland 2023 (PETS): even some redactors that DO destroy the underlying glyphs leak the redacted text's WIDTH as a side channel, because the PDF text-showing operators carry per-glyph advance widths. If a redactor replaces a redacted run with a single advance of the original width, the residual width can be cross-referenced against font metrics to reconstruct the original characters. FileHop addresses this by collapsing the redacted run to a single advance AND quantizing it to a 500-unit grid (the REDACTION_TJ_QUANTUM constant in the source) — so the exact width is not recoverable from the stream. Most consumer redactors do not address this attack.
Does FileHop find SSNs, DOBs, or account numbers automatically?
No. FileHop's redaction is search-or-draw: you draw a rectangle, or you type a specific string and FileHop finds every occurrence. It does NOT pattern-match SSNs, DOBs, or account-number formats with AI auto-detection. If you need pattern-based auto-redaction across a thousand-page production, look at CaseGuard, redactor.ai, or iDox — those tools have the AI workflow but accept the SaaS upload posture. FileHop's lane is the careful, surgical, local redaction of a specific document.
Will redaction invalidate my PDF's digital signature?
Yes — and that is the structurally correct behaviour. A signed PDF that has been modified should NOT still verify against the original signature. If you need to redact a signed PDF: verify the signature first and record the verification (screenshot, timestamp), then redact, then re-sign if your workflow requires a signed-and-redacted output. Sign AFTER redaction, not before.
Can I do this on an iPad or Linux machine?
FileHop runs on macOS and Windows desktop only — no iPad, no Linux, no web version. On iPad, the closest equivalent workflow is Adobe Acrobat or PDF Expert; both have real redaction tools. Whichever tool you use, run the verification checklist above on the output — the checklist is tool-agnostic and works on any redactor's PDF.
What about scanned PDFs without a text layer?
If the PDF has no text layer, search-and-redact won't work — there are no characters to search. You can still draw rectangles manually to redact image regions; FileHop will paint the underlying image pixels to black inside those rectangles and verify the result. If you want search-and-redact on a scanned PDF, you need OCR first. FileHop's OCR is currently cloud-based (opt-in); if that's not acceptable for your document, run OCR upstream in a tool you trust and bring the searchable PDF back.
Is the redaction destructive — can I get the original back?
Yes, it's destructive. FileHop writes a new PDF without the redacted content; the original file (the one you opened) is left untouched on disk. The metadata Info dictionary is also sanitized in the same pass, so the redacted output does not need a separate metadata-strip step. Best practice: keep the original in your matter folder and use the redacted file for filing/sending. Document the chain if your firm requires that.
Will free online redactors work for this?
Functionally, some of them do remove the underlying text. The privacy posture is the problem. Uploading a draft motion, a deposition exhibit, or a privileged client document to a third-party online redactor recreates the disclosure problem the redaction was meant to prevent. State bars increasingly treat tool-choice as part of the 'reasonable security measures' standard (ABA Formal Opinion 477R, 2017) for protected information. Process locally.

Before you file or send

Mark the regions to destroy. Apply. Save. Run the copy-paste test before you file or send — every time, regardless of which tool produced the redaction. The verification checklist takes three minutes and catches every famous failure mode. If you do this kind of file work regularly — redact, combine, compress, annotate, sign, scrub metadata — the persona page at /for/lawyers/ walks the broader workflow set, and the related guides below cover the adjacent steps.