top | item 46371238

(no title)

mlissner | 2 months ago

No, we worked with researchers that developed that kind of system, but didn't broadcast our work b/c the research was too sensitive. Seems the cat is out the bag now though.

I think the combination of AI and font-metrics is going to be wild though. You ought to be able to make a system that can figure out likely words based on the unredacted ones and the redaction's size. I haven't seen any redaction system yet that protects against this.

discuss

order

vlovich123|2 months ago

I thought glyph spacing attacks are an old idea; like I recall reading about such ideas 10-20 years ago unless I’m misremembering. Can you clarify why it was considered “too sensitive” if the whole point of this effort is to showcase these attacks?

mlissner|2 months ago

It’s a fine line. Most redactions are for the good, to protect someone or something. For example even in the Epstein files, where some redactions are being abused, most redactions are protecting victims.

If there’s a way to undo huge amounts of redactions, that’d certainly be a net negative. Sort of like if encryption were suddenly broken, you wouldn’t publish a paper saying so.

Our goal has always been to educate about the problem so that it can be addressed. We didn’t have resources to push on the font metrics approach, so we stayed mostly quiet about it.

thangalin|2 months ago

> I haven't seen any redaction system yet that protects against this.

The linked article suggests widening redacted areas more than needed with some randomization applied to the width. Strikes me that that wouldn't do much except add a few more possible solutions.

vlovich123|2 months ago

Yeah, the more robust protection is to widen to a constant. But in the general case that could require reflowing the pdf. But honestly single word redactions are really probably useless with cheap AI that can highly accurately fill in the gaps

NoboruWataya|2 months ago

This is going to be a disaster IMO because AI will just hallucinate what it thinks is the most probable redacted word and people will take that as gospel.

PunchyHamster|2 months ago

"don't redact or we will hallucinate something worse and make people believe it as gospel" is nice deterrent

hahn-kev|2 months ago

Maybe we should all just use mono-space fonts for everything