On an only slightly related note: is there any good way to check PDFs for malware/executables?
If I'm stuck with an attempt at it, the best I can think of is opening in a new QEMU or docker with no Internet access, but that's 1) a fair but of work to check something, and 2) probably not even that secure. Using some cli tool, like xxx, bat, or ranger, that does some processing to extract the text and looking at just that feels more secure - but I know it really isn't.
What is a simple tool to "clean" PDFs?
An ML tool that does QEMU/docker/no-net to extract the content, turns that into game, and saves a typst/latex template with it would probably be the best possible outcome - but that's a decent (yet potentially very lucrative) task.
For analysis, I’ve used Didier’s tools. If you just want a safe way to open it, upload it to a cloud storage provider which destructively renders the pdf. Box or Google drive should work.
What you mean with "PDFs with malware/executables"?
If you're talking about embedded active content within them, then a reader application can just ignore/not run it.
If you're talking about a crafted PDF that exploits, let's say, font rendering bugs inside the reader than it's near impossible. Keep your applications updated.
chaxor|2 years ago
If I'm stuck with an attempt at it, the best I can think of is opening in a new QEMU or docker with no Internet access, but that's 1) a fair but of work to check something, and 2) probably not even that secure. Using some cli tool, like xxx, bat, or ranger, that does some processing to extract the text and looking at just that feels more secure - but I know it really isn't.
What is a simple tool to "clean" PDFs? An ML tool that does QEMU/docker/no-net to extract the content, turns that into game, and saves a typst/latex template with it would probably be the best possible outcome - but that's a decent (yet potentially very lucrative) task.
peddling-brink|2 years ago
https://blog.didierstevens.com/programs/pdf-tools/
worewood|2 years ago
If you're talking about embedded active content within them, then a reader application can just ignore/not run it.
If you're talking about a crafted PDF that exploits, let's say, font rendering bugs inside the reader than it's near impossible. Keep your applications updated.
flexagoon|2 years ago
On Android, for example, there is the GrapheneOS Pdf Viewer [1]. It's readme has a pretty good explanation of how it works.
1: https://github.com/GrapheneOS/PdfViewer
qwertox|2 years ago
maxerickson|2 years ago