This week is going to be a super short one. I was on-call at work so a lot of my time went into random work tasks and when I was done with work, I didn’t really want to do anything else. I mostly played video games, watched Cobra Kai, and wrote my BSides talk.
Redacting PDFs
An annoying problem came up at work over the week, a team that only has Chromebooks needed to be able to redact PDFs before sending them to people. The PDFs might contain sensitive PII that we didn’t want to be relaying to another party.
The Adobe Acrobat extension didn’t really work well on Chromebook. Many other PDF editing extensions didn’t support any means of redacting contents on the page. The ones that did would upload your PDF to their cloud silently in the background, which was a non-starter for documents that may contain Name, DOB, and SSN combinations.
With the help of ChatGPT, I was able to combine pdf-lib, pdf.js, and fabric.js into a single-page local PDF redaction tool. It’s dirt simple and roughly goes like this:
- read the PDF file selected
- draw it on the canvas
- allow the fabric canvas to draw black rectangles over the PDF
- saving the rectangle coordinates and scale
- When you press save, take the black rectangles from the fabric canvas and the PDF from the PDF canvas and apply them to a new canvas, automatically creating the correct redactions in the correct location, then save that page as a PNG to a new PDF.
The one caveat with this approach is that you lose all interactivity with the resulting PDF (e.g. you can’t search for text or select text anymore). Everything gets flattened down into an image. But this is perfect for our use case, as the PDFs tend to be fairly short documents where the security of the original data is paramount. By choosing to flatten the PDF down into a series of PNGs, I don’t have to worry about the redaction boxes messing up the formatting of the page, and I don’t have to worry about poor redaction options that would leave the text retrievable. We also get these PDFs from all kinds of sources, so they don’t always have selectable text or anything anyways, sometimes the PDFs we get are already just PNG pages, so this option works the most flexibly.
Over all, between me and ChatGPT, we finished this in a little under a day, adding keyboard shortcuts for all actions, adding drag-and-drop, and packaging it up into a chrome extension that our Chromebook users can run locally on their machines. But ultimately the whole thing can be served out of a single HTML file, which we could host on our CDN, or as part of our administrative applications. All of the processing is done locally in the browser, so it works perfectly for our Chromebook users. One area for future improvement would be to automatically load the PDFs from our object storage, preventing the need for the user to even have a copy of the unredacted PDF on their laptop.
I’m a little surprised and mostly disappointed that the Chrome Web Store was full of sketchy PDF tools, either straight up harvesting your data, or offering to do it for free, only to charge you for it once you try to save your files. Slimey.
What I’m Reading
Tor: From the Dark Web to the Future of Privacy
By Ben Collier
ISBN: 9780262548182Learn More
No real progress reading this week.
Interesting Links
- Google Now Defaults to Not Indexing Your Content - I don’t know how much I believe the title here, but it does highlight interesting challenges that Google has to contend with now that we have the combination of decades of sketchy SEO work combined with the power of LLMs to flood the internet with garbage.
Upcoming Projects
- BSides Las Vegas Talk - Accepted! - I will be presenting “Free Your Mind: Battling Our Biases” at BSides Las Vegas 2024 at 15:30 on August 6th in the Common Ground track. This will be my first return to a public stage in like 6 years, and my first time speaking in Vegas. Stay tuned.
- Defcon 32 Call For Soundtrack - “Oh Dade”, produced by Mikal kHill, has been Accepted. It will premier during Defcon. I will follow up with the organizer to find out if I can also release it on streaming services. (Due: N/A - Done)
- PyBay 2024 Talk - Submitted - I have wanted to get out of the security conference space and talk about security-related things at other types of conferences for a while, and I had an idea for a talk that I think fits perfectly with PyBay. Bonus, PyBay is local, so I don’t have to travel. (Due: N/A - Done)