Permanently redact a PDF — without leaving the text underneath
The kind of redaction that survives a copy-paste test. Open the file on your Mac or Windows desktop, draw a rectangle, and FileHop destroys the text glyphs, image pixels, and vector paths inside it — then re-walks the output to confirm nothing redactable survived. If verification fails, the file is deleted before it can be saved. No upload. No black boxes drawn on top of live text.
How redactions fail (and why famous ones still happen)
Almost every public redaction disaster is the same mistake: a black rectangle drawn on top of a PDF — using a markup tool, a highlight tool, or a draw-shape tool — saved, and filed. The black box is a graphical annotation. The text underneath is untouched. Anyone who opens the PDF and selects the redacted region with the cursor, then copies and pastes into a text editor, gets the original text back.
Cautionary case · 2019
Manafort court filing — black boxes drawn on top of live text
Court filing in the Paul Manafort matter contained passages 'redacted' as black rectangles drawn over text. The text layer was untouched. Within hours, readers were copy-pasting the polling-data passages out of the PDF. ABA Journal coverage, January 2019.
Cautionary case · 2014
New York Times — NSA agent name leaked through a 'redacted' PDF
As part of its Snowden-documents reporting, the NYT published a PDF with a name and program target 'redacted' by black overlay. The text was extractable from the file the same day. Techdirt coverage, January 2014.
The pattern is identical every time. The lawyer or editor uses the wrong tool — a draw-shape or highlight in Acrobat, a black rectangle in Preview, a graphic overlay in Word — instead of the dedicated Redact tool. The visible page looks fine. The underlying file is unchanged.
There is a second, more subtle failure mode that academic researchers documented in 2023: even some redactors that DO remove the underlying glyphs leak the redacted text's WIDTH as a side channel. The standard PDF text operators (Tj for a single string, TJ for a string-with-glyph-positioning) carry per-glyph advance widths; if a redactor replaces the redacted run with a single shift of the same width, the residual width can be cross-referenced against font metrics to reconstruct the redacted text. Most consumer redactors do not address this. The reference is Bland, "Story Beyond the Eye: Glyph Positions Break PDF Text Redaction," Proceedings on Privacy Enhancing Technologies (PETS) 2023.
What 'real' redaction means (in PDF terms, in plain English)
A PDF page is not a picture. It is a small program that tells a PDF reader how to draw the page: set font to Times, move to position X/Y, show this string of glyphs (Tj), apply this clip path, draw this image at this scale. The visible page is the OUTPUT of running that program. The text underneath every black box you see on screen lives in that program as a Tj or TJ operator carrying the actual character codes.
An overlay redaction adds a new instruction to the program: AFTER you draw the text, draw this black rectangle on top of it. The text-showing operator is still there. Anyone who runs the program — including any tool that extracts text without rendering — gets the original characters back.
Real redaction rewrites the program. The text-showing operator is found, the characters inside the redaction rectangle are deleted, and the operator is rewritten with the surviving (non-redacted) characters. If an entire run is inside a redaction, the operator is removed. If half the run is inside, the operator is split. Image pixels inside the redaction are painted to solid black in the underlying image samples — not just covered. Vector paths fully inside the redaction are dropped from the content stream.
FileHop's redactor goes one step further. After the destructive pass, it re-opens the output file and walks it from scratch — same parser, same code path — looking for any glyph, inline image, or non-black image pixel that survived under any redaction rectangle. If anything survives, the function fails closed: it returns an error AND deletes the output file before the user can save it. The on-disk source file is never modified; it stays where it was while a new file is written.
“Destructive PDF redaction: permanently removes text glyphs, image pixels, contained vector paths, and inline images inside redaction regions, then re-walks the output to confirm nothing redactable survives. Fails closed when content cannot be redacted faithfully.”
The glyph-position side channel — and FileHop's structural answer
There is one additional protection against the glyph-positioning side-channel attack documented in Bland 2023. When a redacted text run is rewritten, its total advance width is collapsed to one number AND quantized to a 500-unit grid (the REDACTION_TJ_QUANTUM constant in the source). The comment in the code reads: "Collapsing a run to one number already defeats per-glyph recovery; quantizing the total coarsens the residual length side channel so the exact width of the removed text cannot be measured from the stream." This addresses the Bland attack directly.
In plain English: most consumer redactors that DO remove glyphs still leave behind a number that says, in effect, "insert this much horizontal space here." That number is the width of the redacted text in font units. With the font and the width, an attacker can often guess the original word. FileHop collapses the whole redacted run to a single number and rounds that number to a coarse grid, so the residual width is no longer a faithful measurement of what used to be there.
What FRCP Rule 5.2 says must be redacted from a federal court filing
Federal Rule of Civil Procedure 5.2 — Privacy Protection For Filings Made with the Court — sets the floor for what must be redacted from a federal court filing. The same categories show up across CM/ECF e-filing rules in nearly every federal district. The Rule's text is plain: it lists five categories with specific truncation rules. The clerk does NOT review filings for compliance — the responsibility is on the filer.
- 1
Social Security numbers and Taxpayer ID numbers
Only the last four digits may appear in the filing.
- 2
Birth dates
Only the year of birth may appear.
- 3
Names of minor children
Only the initials may appear.
- 4
Financial account numbers
Only the last four digits may appear.
- 5
Home addresses (in criminal cases)
Only the city and state may appear, by court rule in many criminal-case contexts; check local court rules.
State courts vary. Many follow the federal Rule's structure; some add categories (medical record numbers in some jurisdictions; immigration status in others). Always check the specific court's local rules. Source: Federal Rules of Civil Procedure Rule 5.2 (Cornell Legal Information Institute). For criminal cases the parallel rule is FRCP 49.1 with its own variations.
The FileHop redaction workflow (locally, in 4 minutes)
All four steps run inside the FileHop desktop app on your computer. Your PDF does not transit our servers at any point during this workflow. Mac and Windows.
- 1
Step 1: Open the PDF in FileHop
Drag the PDF into the FileHop app or use File → Open. If the PDF is password-protected, FileHop will prompt for the password — the file is unlocked in-memory only, not modified on disk. The original file stays exactly where it is on your drive throughout this workflow; FileHop writes a new file at save time, never overwriting the source.
- 2
Step 2: Mark what to redact — by drawing, or by text search
You have two ways to mark redaction regions. (a) Draw: select the Redact tool and drag a rectangle over each region you want destroyed. This is the manual mode; use it for arbitrary content (a paragraph, a name, a section of a deposition exhibit). (b) Search-and-mark: type a string (a specific name, account number, address) and FileHop finds every occurrence in the PDF and queues a redaction mark at each match. Searches are case-insensitive. This is faster for repeated identifiers, but it requires the PDF to have a text layer — for a scanned PDF with no text layer, you have to draw the rectangles manually (or OCR the file first, which currently requires opting into FileHop's cloud OCR; if OCR's cloud posture is not acceptable for this document, OCR it upstream and bring the searchable PDF back). What FileHop does NOT do at this step is pattern-based AI auto-detect — FileHop will not 'find all SSNs' or 'find all DOBs' for you. If you need pattern-detection auto-redaction, the right tools are CaseGuard or redactor.ai (both SaaS — accept the upload posture). FileHop's redaction is search-or-draw, not detect.
- 3
Step 3: Apply the redactions
When every region is marked, click Apply Redactions. FileHop rewrites the page content streams: each Tj/TJ text operator that falls under a redaction rectangle is rewritten with the redacted glyphs removed; the residual run's total advance width is collapsed to a single quantized number (500-unit grid) so the side-channel attack from Bland 2023 does not leak the original width; image XObjects under any redaction rectangle have their underlying RGB samples painted to solid black; inline images entirely inside a redaction are dropped from the operator stream; vector paths fully inside a redaction are dropped. Form XObjects (used by multi-page templates and stamped content) are recursed into and their content streams are rewritten the same way. If the page uses a Type3 font (rare, but it happens in some old or specialized PDFs), FileHop aborts the redaction and returns an error — better to fail loudly than to falsely report a partial redaction.
- 4
Step 4: Save (and let FileHop verify automatically)
Choose Save As and pick an output path. FileHop writes the new PDF, then automatically re-opens that output file from scratch and re-walks every page that had a redaction. If any text glyph survives under any redaction rectangle, OR any inline image survives, OR any pixel in a painted image is not solid black inside the painted region, the function returns an error AND deletes the output file before you can use it. You will see an error in the app and no redacted file will be saved. The verify_redaction function in services/pdf/redactor.rs returns one of three error strings depending on what survived: "text is still present under a redaction", "an inline image is still present under a redaction", or "image content is still present under a redaction." The metadata Info dictionary is also sanitized in the same pass — the redacted output does not need a separate metadata-strip step.
Verify it actually worked (60-second checklist — works for any redactor)
FileHop's automatic re-walk verification gives you fail-closed assurance for the FileHop path. The habits below are tool-agnostic and work on a redacted PDF from any source — Acrobat, FileHop, an online redactor, anything. Run them on every redacted file you are about to file or send, regardless of which tool you used. Three minutes; do it cold.
- 1 Copy-paste test. Open the redacted PDF in your default reader. Use Cmd+A (Mac) or Ctrl+A (Windows) to select all. Then paste into a plain-text editor (TextEdit on Mac, Notepad on Windows — NOT Word, which can hide structure). If any redacted text appears in the text editor, the redaction is overlay-only and the file is broken. Throw it away.
- 2 Save-as-text test. From your PDF reader, use File → Export As → Text (in Preview, Acrobat, or any modern reader that supports text export). Open the resulting .txt file. Search (Cmd+F / Ctrl+F) for the specific strings you redacted — the name, the SSN's middle digits, the account number's middle digits, the phrase. If any redacted string is in the .txt, the redaction is broken.
- 3 Second-reader test. Open the PDF in a DIFFERENT reader than the one you used to redact it (if you redacted in Acrobat, open in Preview or Edge; if you redacted in FileHop, open in Acrobat or Preview). Repeat the copy-paste test in the second reader. Different readers extract text via different paths; if one passes and another fails, you have an overlay-only redaction.
- 4 Search-the-file test (advanced). Open the PDF in a hex/binary viewer or use a command-line tool to dump the raw object stream. Search for the redacted strings literally. If they appear, the redaction is broken. This step is overkill for most filings but is the only definitive test if you suspect a side-channel issue.
- 5 Visual cold-open test. Close the file. Reopen it. Scroll every page. Confirm no leftover annotation comments, no draw-shape outlines, no whiteout, no highlighter residue on top of the page content. A real redaction will show as opaque black where the redactor intended; an overlay redaction sometimes shows the annotation handles when the redaction layer is selected.
What this workflow does NOT do (and where to go for those needs)
- Not bar-ethics or court-certified compliance. FileHop does not certify that any specific redaction meets any specific jurisdiction's ethics or court-rule standard. The redaction is destructive at the PDF content-stream level, output is automatically re-verified before save, and the verification checklist above is the user-side belt-and-braces. The combination is engineering-defensible; whether it satisfies your court, your bar, your client, or your malpractice carrier is a question for you and your firm, not a question FileHop answers.
- Not pattern-based AI auto-detection. FileHop redacts what you mark — by drawing rectangles or by text search for known strings. If you need 'find all SSNs / DOBs / account numbers / names in this thousand-page production' AI auto-detection, the right tools are CaseGuard, redactor.ai, iDox, or similar. Those tools are SaaS (upload-based) — accept the privacy posture trade if you need the AI workflow. FileHop's lane is the careful, surgical, local redaction of a specific document a lawyer is about to file or send.
- Not for redacting signed PDFs and keeping the signature. A redacted PDF has a modified content stream by definition. An existing digital signature on the source will be invalidated by the redaction — which is the structurally correct behaviour (a signed document that has been modified should not still verify as the original signature). Sign AFTER redaction, not before. If you receive a signed PDF and need to redact it, the workflow is: verify the signature first and record the verification (screenshot, timestamp), then redact, then re-sign if your workflow requires a signed-and-redacted output.
- Not for redaction with automatic exemption-code labelling. FOIA productions and some agency workflows require each redaction to carry a label naming the legal basis (b(6) personal privacy, b(7)(C) law enforcement personal privacy, etc.). FileHop does not auto-stamp these. You can draw an annotation/text overlay over the redaction with the exemption code, but it is a manual step. If you do high-volume FOIA production with exemption-code requirements, dedicated FOIA tooling (FOIAXpress, CaseGuard's exemption-log feature) is a better fit.
- Not on iPad / Linux / web. FileHop runs on macOS and Windows desktop only. Many lawyers redact on iPad with Acrobat or PDF Expert; if that is your primary workflow, this article's tool recommendation doesn't apply but the verification checklist still does. Run the copy-paste / save-as-text / second-reader test on the iPad app's output anyway.
- Encrypted PDFs need to be unlocked first. If the source PDF is password-protected, FileHop will prompt for the password on open and unlock the file in memory; if you don't have the password, redaction cannot proceed.
Why this workflow runs locally
The 'no upload' line in the headline is the wedge for this guide. Here is what it means in practice, with the limits stated honestly.
- •All four redaction steps run inside the FileHop desktop app on your computer. The PDF, the marked redaction rectangles, the search strings (if you used text-search-and-redact), and the output file all stay on your machine. Nothing transits our servers during the redaction itself.
- •No telemetry on file contents. We do not log which document you redacted, what strings you searched, or what regions you marked.
- •No AI training on your files.
- •Open output format. FileHop writes standard PDF that opens in any reader the receiving party uses — Acrobat, Preview, Edge, browsers, court e-filing systems.
- •The on-disk source PDF is never modified. FileHop writes a new file at save time, and the verification re-walk operates on that new file; if verification fails, the new file is deleted and the original is untouched.
- •Honest scope on cloud features: cloud OCR is opt-in and clearly labelled in the app. If you don't turn it on, no part of the file leaves your computer. OCR is only relevant to this workflow if you're redacting a SCANNED PDF with no text layer (and even then, only if you want search-and-redact rather than draw-the-rectangles).
Sources
Authoritative pages used to verify the workflow and the failure-mode claims above. No endorsement implied.
- Federal Rule of Civil Procedure 5.2 — Privacy Protection For Filings Made with the Court (Cornell Legal Information Institute)
- United States District Court, Northern District of Alabama — Proper Redaction Techniques
- United States District Court, Eastern District of California (CAED) — Redaction Requirements / How to Redact
- United States Court of Federal Claims — PDF File Redaction Best Practices
- ABA Journal — How to redact a PDF and protect your clients (Manafort 2019 case write-up)
- ABA Judges' Journal — Embarrassing Redaction Failures (Spring 2019)
- Maxwell Bland — Story Beyond the Eye: Glyph Positions Break PDF Text Redaction (Proceedings on Privacy Enhancing Technologies, PETS 2023)
- Techdirt — New York Times Suffers Redaction Failure, Exposes Name Of NSA Agent And Targeted Network In Uploaded PDF (2014)
- Adobe Acrobat — How to redact a PDF (vendor reference for the Acrobat path)
- ABA Formal Opinion 477R (2017) — Securing Communication of Protected Client Information
FAQs
Does saving a PDF as a flattened image remove the text underneath my redactions? ▼
Is drawing a black box on a PDF actually redaction? ▼
How do I verify a PDF was properly redacted? ▼
What does FRCP 5.2 require to be redacted? ▼
Can a redacted PDF be unredacted? ▼
What is the glyph-positioning side channel? ▼
Does FileHop find SSNs, DOBs, or account numbers automatically? ▼
Will redaction invalidate my PDF's digital signature? ▼
Can I do this on an iPad or Linux machine? ▼
What about scanned PDFs without a text layer? ▼
Is the redaction destructive — can I get the original back? ▼
Will free online redactors work for this? ▼
Before you file or send
Mark the regions to destroy. Apply. Save. Run the copy-paste test before you file or send — every time, regardless of which tool produced the redaction. The verification checklist takes three minutes and catches every famous failure mode. If you do this kind of file work regularly — redact, combine, compress, annotate, sign, scrub metadata — the persona page at /for/lawyers/ walks the broader workflow set, and the related guides below cover the adjacent steps.