(no title)
jkp56
|
6 years ago
These hashes also reveal domain names. Most users visit many URLs on a small set of domains. If a user requests 1000 hashes that all can map to Reddit, it's very likely that the user is indeed reading Reddit. Another way to look at it: if the same person appears in a crowd on hundreds of photos, it's trivial to notice that there is something special about this person, even though in all cases the person was k-anonymous.
dpkonofa|6 years ago
For example, the hash for reddit.com/r/gifs would be different from reddit.com/r/funny and so the prefixes would be different for both of them. Unless the requested hashes are saved for every single user, it would be way too computationally expensive for them to get anything useful out of that. Not to mention the fact that hashes would return the same prefix for any thousands of URLs. Narrowing down which domains those URLs are rooted on would be incredibly hard.
kjeetgill|6 years ago
I don't know why or what part you think this is that hard. Do you think a map from User -> set [Requested URL Hashes] is hard to build? Or that building the URL Hash -> set [possible domains] is hard?
Maybe I'm missing a piece of this.
Building something simple to start guessing domains visited seems pretty easy. If a user has 10 URL hashes and the same domains show up in each hashes' possible domains you're probably requesting pages on that domain. If you're lucky and all the pages from a domain fall into a single hash, all it takes is two or 3 hashes from known outbound links to show up to confirm this.
It's not foolproof but hardly infeasible? Or maybe I don't fully understand the algorithm.
unknown|6 years ago
[deleted]
kccqzy|6 years ago
unknown|6 years ago
[deleted]