Because unlike the authors of this set - who went and stripped the posts out of usernames and permalinks to anonymize it - that set you mention just grabbed data out of the API as-is (at least based on its huggingface description that's left over).That's the difference.
spiffytech|1 year ago
Every time I hear "anonymous data", I think of that time AOL published anonymized search logs (for academic research). The anonymization was negligent, and an NYT reporter de-anonymized and tracked down one of the users with the local & personal info present in the search queries.
https://en.wikipedia.org/wiki/AOL_search_log_release
https://web.archive.org/web/20130404175032/http://www.nytime...