Don't only names, addresses, email addresses, account/phone numbers, and illustrations need to be redacted? Couldn't an LLM do at least most of that fast and cheap?
LLMs are suitable for tasks where it’s cheap to validate the result or the cost of being wrong is low. This is basically the opposite of that: if you fail to redact things, the costs can be high and since these are documents written by humans you need intelligence to handle indirect or obscure language or you’re risking missing a redaction because clues which don’t easily identify someone can be combined to do so. A million dollars is likely to be considerably cheaper than the legal costs of getting it wrong.
acdha|2 months ago
wmf|2 months ago
sema4hacker|2 months ago