top | item 34032204

(no title)

Looking closely, this only works if you know the font name, size, and weight used, or at least can guess it, manually, before feeding the pixelated version into the tool? Still quite fun, but not as scary as the headline made it sound...

discuss

if_by_whisky|3 years ago

Guessing is actually easy. For the kinds of files that end up as redacted pdfs (legal, government, etc), there's probably 5-8 font options that make up 98% of documents. Sizes and weights are immediately recognizable to the slightly trained eye. I'm pretty sure I could guess all 3 attributes at a glance.

happyopossum|3 years ago

Or just look at the unredacted text around it and use that. Nobody is changing fonts on text before pixelation.

mmoskal|3 years ago

Often only parts of text are pixelated.

taneq|3 years ago

It's also a proof of concept. Slap a couple more for() loops in there to iterate through different font options and try a range of alignments and you could have it fully automatic.

kevingadd|3 years ago

There are lots of existing tools that can guess a font accurately if you feed them an image of enough text, so that's not a big obstacle.

lelandfe|3 years ago

Or just use the rest of the document to build the corpus?