top | item 24330901

Identifying People by Their Browsing Histories

64 points| signa11 | 5 years ago |schneier.com

21 comments

order
[+] AdmiralAsshat|5 years ago|reply
So researchers at Mozilla have replicated the findings of a 2012 study that allows users to be identified through the collation of data from third-party trackers as the users browse popular websites.

Question: Was Mozilla replicating using a "vanilla" browser (e.g. no adblocking/tracker protection)? Or is this even after tracking mitigations are put into place? Mozilla's own Firefox now has built-in "tracker" protection in all versions of its browser, so it seems like they would be well-positioned to test whether the de-anonymization is thwarted with tracking protection toggled on or off.

[+] pflock|5 years ago|reply
Not sure about with Firefox's built in tracker protection. But I recall reading that having ad/tracker blocking could actually make a user easier to uniquely identify, because the browser now behaves differently than most: https://panopticlick.eff.org/about. My hope would be that built-in tracker will make ad-blocking browser traffic more ubiquitous.
[+] rvrabec|5 years ago|reply
"High uniqueness hold seven when histories are truncated to just 100 top sites." This is similar to the app finger printing they do with mobile phones, identify you by the unique assortment of apps on your device.

Should the top concern be about identification or deep collection of browsing history?

[+] hombre_fatal|5 years ago|reply
The point is that browser history collection is the same as cross-site tracking. Any 3rd party analytics operation like Google Analytics is able to access your browser history. To such a point that whether they do or not shouldn't matter and couldn't be proven anyways.
[+] elliekelly|5 years ago|reply
Does iOS allow access to that information?
[+] sct202|5 years ago|reply
Page 11 of the original doc has "Theoretical third-party reidentifiability rates" by company: https://www.usenix.org/system/files/soups2020-bird.pdf

I'm surprised how many companies (Facebook, Verizon, Adobe, Oracle, Twitter) are almost matching Google's tracking networks. Google's makes sense based on the amount of Adsense / Analytics trackers there are out there, but I hadn't realized these other companies are just as pervasive.

Edit: typo.

[+] natcombs|5 years ago|reply
ELI5?

I get that a browsing history C is unique, but if I clear it, how can you identify that my new history D is tied to C and not unique?

[+] manicdee|5 years ago|reply
This isn’t about the history stored on your computer, this is about the browsing habits observed in real time by eg: ad agencies who get lots of information about where you have been because their ads run everywhere.
[+] hombre_fatal|5 years ago|reply
Because you visit the same domains across sessions. Throw in some less common domains like HN (i.e. top 1000 instead of top 100 website), and you're trivial to reidentify. (This is what "reidentification" is referring to in the quoted part of the paper in TFA)
[+] ravenstine|5 years ago|reply
Now I feel even better about disabling my browser history in Firefox.
[+] reificator|5 years ago|reply
The only thing that does is prevent you from getting anything useful out of your history. It does not prevent the topic at hand.
[+] _jal|5 years ago|reply
Indeed! The day I started ignoring my bank statement was the day I became truly secure.
[+] ScannerSparkly|5 years ago|reply
I doubt you could accurately identify a specific person
[+] ben_w|5 years ago|reply
33 yes-no questions which each split the audience in half, uniquely identifies slightly more than the world population.

But any given website is much tighter than that: a regular visitor to Cambridge Evening News is unlikely to be based in राजनांदगांव, and vice versa.

Someone is regularly accessing the website of one local take-away restaurant in Larnaca, a gay men-only dating website, and Ars Technica? That’s probably already got you down to 2-15 people out of the ~3e9 on the internet, with just three specific websites in their history.

[+] derivagral|5 years ago|reply
Why do you doubt this? Even if you can't always trust the 'unique' part, it is still information which can be combined to produce a more accurate profile.