top | item 44052290

(no title)

kd5bjo | 9 months ago

A quick read through of their anonymization process seems to indicate that they didn’t scan the message contents for PII (other than usernames).

If true, that seems like a huge oversight. I also wonder what would happen if someone finds their information in the dataset and requests it to be removed per GDPR or other privacy legislation.

discuss

order

bawolff|9 months ago

I can't help but think that if you say something in a public forum you should implicitly give up the right to privacy.

E.g. if someone scraped hackernews and made a dataset containing this comment, i don't think i should have any right to complain.

jowea|9 months ago

I understand wanting to be careful, but didn't they only grab messages from servers that are already very public? Are Twitter message datasets anonymized?

Cynddl|9 months ago

That's not how GDPR works and in this case the data is clearly anonymised despite the authors' claims. Amongst others, there needs to be mechanisms for users to delete their data, whether it was at some point public or not.