top | item 25275899

(no title)

> Data anonymized with Amnesia are statistically guaranteed that they cannot be linked to the original data.

It looks like (from other text on their site) they use variants on k-anonymity. This can prevent re-linking attacks back to the original data, but we've also known for a decade that this isn't especially strong. For example, two independent k-anonymous releases can unique identify everyone in the dataset[0].

[0]: https://dl.acm.org/doi/10.1145/1401890.1401926

discuss

La1n|5 years ago

However that statistical guarantee also requires your pseudoidentifiers to be picked correctly, i.e. it only holds true if you select all variables the attacker could possibly know about a subject. I think that is the hard part here, it's not something I would recommend someone doing without a lot of research and experience for highly dimensional data.

antisyzygy|5 years ago

Right. Even if you assume the worst-case-scenario there isn't some standard risk metric nor threshold to meet.

I feel like differential privacy is the strongest definition we have, but it is also lacking from a practical standpoint. What does it mean to have N nats/bits of information gain from seeing the result of a query? How does this translate to my risk of a PII leak?