top | item 11729438

Online tracking: A 1-million-site measurement and analysis

592 points| itg | 10 years ago |webtransparency.cs.princeton.edu | reply

268 comments

order
[+] randomwalker|10 years ago|reply
Coauthor here. I lead the research team at Princeton working to uncover online tracking. Happy to answer questions.

The tool we built to do this research is open-source https://github.com/citp/OpenWPM/ We'd love to work with outside developers to improve it and do new things with it. We've also released the raw data from our study.

[+] Freak_NL|10 years ago|reply
What can be done by the browser vendors such as Mozilla, Google, and Microsoft?

To prevent fingerprinting, your browser has to disable all sorts of useful modern JavaScript API's (e.g., WebRTC) by default, prevent spurious HTTP requests (e.g., to prevent abusing @font-face to find out which fonts are installed), and pretend you are an American using the most popular web browser of the moment (i.e., hide the user's preferred language and claim en-US as your preference, and change the user agent string to blend in to the crowd).

This is all assuming people don't run any third party plugins like Flash.

Are browser vendors on track to figure out a solution to this problem that combines user friendliness with privacy? Or will anonymous browsing remain a privilege for those with the right amount of technical know-how?

The problem it seems is that simply disabling JavaScript is not an option for normal web browsing, and even a requirement for interacting with the web services used by organisations you have a relation with (e.g., the government, insurance companies, banks, etcetera).

[+] rdancer|10 years ago|reply
When everybody was running Windows on a smorgasbord of hardware / patchlevel / plugins / fonts, it was easy to fingerprint. Are we moving towards a more monolithic landscape where fingerprinting is less able to track individual users?

* If I have a fleet of Chromebooks running the same version of Chrome OS, will they all have the same fingerprint?

* Will, say, all iPhones 6 with the same hardware parts, running the same Mobile Safari and iOS version, have the same fingerprint?

Thank you!

[+] dccoolgai|10 years ago|reply
This is much-needed research. Thank you for your work. Regarding the WebRTC tracking- would it be possible for WebRTC to work without exposing the local IP? I.e. is there any real reason that fingerprint needs to be there?
[+] projectramo|10 years ago|reply
I am going to ask about a really basic question: what is fingerprinting?

I had to dig around, from the paper is sounds like a stateless form of tracking.

The audio example made sense:

1. the mic comes on, and it identifies a particular background noise.

2. I browse to another site, or a different page without a cookie.

3. The mic comes on again, matches the ambient noise and realizes I am the same person.

Is that what you mean? If this is the case, how can the "canvas fingerprinting" work since I had to browse to a new page and all the old pixels from the previous page are no longer there.

Anyway, if it is what I understand it to be, then it sounds very interesting. I bet some science fiction author wishes they had though to use it.

[+] ultramancool|10 years ago|reply
As soon as I saw these APIs being added I immediately dropped into about:config and disabled them. How the hell do these people think this is a good idea to do without asking any permissions?

Put these in your user prefs.js file on Firefox:

user_pref("dom.battery.enabled", false);

user_pref("device.sensors.enabled", false);

user_pref("dom.vibrator.enabled", false);

user_pref("dom.enable_performance", false);

user_pref("dom.network.enabled", false);

user_pref("toolkit.metrics.ping.enabled", false);

user_pref("dom.gamepad.enabled", false);

Here's my full firefox config currently:

https://up1.ca/#nUSA1WtY13ECfmYC5c825w

Privacy on the web keeps getting harder and harder. Of course this should only be used in conjunction with maxed out ad blockers, anti-anti-adblockers, privacy badger and disconnect.

We need browsers to start asking permission. When you install an app on Android or iOS it says "here's what it's going to use, do you want this?". The mere presence of the popup would annoy people and prevent them from using these APIs.

[+] Zooper|10 years ago|reply
Thank you, user, for making your fingerprint hash more unique by disabling certain default features, given your user-agent string, thus opting into cat-facts.
[+] mccr8|10 years ago|reply
Of course, now a site could potentially fingerprint you by the set of APIs you have disabled!
[+] imtringued|10 years ago|reply
It's great that mozilla decided to remove about:permissions. I do enjoy the fact that I now have to visit every website whose permissions I want to change instead of managing all permissions from a single location.
[+] brudgers|10 years ago|reply
Google has a vested interest in information leakage. I have a suspicion that the Chromium project expresses a strategic desire to shape the direction of browser development away from stopping those leaks. The idea of signing into the browser with an identity is a core feature and in Google's branded version, Chrome, the big idea is that the user is signed into Google's services.

Google only pitches the idea of multiple identities in the context of sharing devices among several people: https://support.google.com/chrome/answer/2364824?hl=en and even then doesn't do much to surface the idea. https://www.google.com/search?hl=en&as_q=multiple+identities...

[+] exelius|10 years ago|reply
This is why Firefox is gaining momentum; they seem to be the only browser interested in user privacy. Users are definitely interested.
[+] evan_|10 years ago|reply
Google is already really good at tracking people, why would it introduce vectors that would help other vendors catch up? You would have to demonstrate that Google itself was using these vectors for tracking.
[+] rdancer|10 years ago|reply
This is the kind of nonconsensual sureptitious user tracking that the EU privacy directive 2002/58/EC concerns itself with, not those redundant, stupid cookie consent overlays.
[+] sleepychu|10 years ago|reply
If you consent to the way things have always been, do nothing!
[+] nailer|10 years ago|reply
So a regular site using, say, mixpanel doesn't need to show a warning?
[+] f-|10 years ago|reply
Although the emphasis on the actual abuse of newly-introduced APIs is much needed, it is probably important to note that they are not uniquely suited for fingerprinting, and that the existence of these properties is not necessarily a product of the ignorance of browser developers or standards bodies. For most part, these design decisions were made simply because the underlying features were badly needed to provide an attractive development platform - and introducing them did not make the existing browser fingerprinting potential substantially worse.

Conversely, going after that small set of APIs and ripping them out or slapping permission prompts in front of them is unlikely to meaningfully improve your privacy when visiting adversarial websites.

Few years back, we put together a less publicized paper that explored the fingerprintable "attack surface" of modern browsers:

https://www.chromium.org/Home/chromium-security/client-ident...

Overall, the picture is incredibly nuanced, and purely technical solutions to fingerprinting probably require breaking quite a few core properties of the web.

[+] pmlnr|10 years ago|reply
So... what we need is a browser, which says it supports these things but blocks or provides false data on request and looks as ordinary as possible for "regular" browser fingerprinting.

Is anyone aware of the existence of one?

[+] madeofpalk|10 years ago|reply
The problem here is Canvas fingerprinting - that's what I found the most surprising and interesting.

How do you prevent that, apart from working on 'fixing' browsers to create pixel-perfect renders across different browsers/platforms/configurations. Would that even be possible?

Edit:

> Tor Browser notifies the user for canvas read attempts and provides the option to return blank image data to prevent fingerprinting.

Huh. I guess that's one attempt, but being able to read pixel data out of a canvas is completely reasonable.

[+] TazeTSchnitzel|10 years ago|reply
I think Tor Browser tries to do this for some types of fingerprinting.
[+] anexprogrammer|10 years ago|reply
Colour me unsurprised. Disappointed though.

I'm glad I disabled WebRTC when I first discovered it could be used to expose local IP on a VPN.

These "extension" technologies should all be optional plugins. Preferably install on demand, but a simple, obvious way to disable would be acceptable. (ie more obvious than about:config)

Not a great deal can be done about font metrics other than my belief that websites shouldn't be able to ferret around my fonts to see what I have. Not like it's a critical need for any site.

[+] moron4hire|10 years ago|reply
What would anyone do with your internal network IP?

Having these features as optional plugins means they are basically impossible to count on having in the basic web platform, meaning you're going to fight a losing battle to gain adoption for any applications that need them.

And the open web platform is the only platform right now that is enabling developers to create cross-platform applications outside of the restrictions of walled-garden app stores.

[+] beardog|10 years ago|reply
If your VPN is configured correctly, your IP will not be exposed.
[+] amelius|10 years ago|reply
> These "extension" technologies should all be optional plugins.

But then still whether you installed an extension would contribute a bit of information to your fingerprint.

[+] jimktrains2|10 years ago|reply
NoScript is an all-or-nothing approach. Are there any JS-blockers that allow API-level blocks?
[+] phaer|10 years ago|reply
If you use Firefox or Iceweasel, you can disable most of those apis in about:config or user.js. For example, media.peerconnection.enabled = false, to disable WebRTC. dom.battery.enabled = false for battery, etc.
[+] MatekCopatek|10 years ago|reply
This would make absolute sense. Certain requests (like location) already trigger popups that ask you for permission. If it turns out other APIs can be equally revealing as far as privacy goes, it would make sense to present the same popup.

I mean, using a web app for the first time would be no different then installing a mobile app - I wouldn't be surprised if I had to give it a few permissions.

[+] hsivonen|10 years ago|reply
If you disable a unique combination of APIs that combination becomes your fingerprint.
[+] kapep|10 years ago|reply
By disabling specific APIs, you would make your browser even more identifiable.

It would only work if many users have disabled exactly the same APIs as you and all other non-disabled APIs don't provide any information useful for fingerprinting.

[+] idbehold|10 years ago|reply
It's kind of surprising that there isn't an extension to provide this functionality (at least in desktop browsers). All you'd have to do is monkey patch the methods that get called and throw up a confirm("are you sure you want to allow [X]")
[+] BinaryBullet|10 years ago|reply
I don't know of any. I would think it would be fairly easy to create a userscript or extension to stub built-in APIs (maybe using something like testdouble.js or sinon.js to override the default global objects that you are trying to "disable"). I'm not sure what issues you'd run into on various pages if you did that though (so it'd probably need a lot of iteration- and fixing bug reports).

It might be a fun project to start though. I've been really enjoying testdouble's API (and have started using that for my unit tests).

[+] cptskippy|10 years ago|reply
All of this makes me wonder how some of these interfaces should be more closely guarded by the user agent.

Perhaps instead of a site probing for capabilities, they should instead publish a list of what the site/page can leverage and what it absolutely needs to work. Maybe meta tags in the head or something like the robots.txt. Browsers can then pull the list and present it to the end user for white-listing.

You could have a series of tags similar to noscript to decorate broken portions of sites if you wanted to advertise missing features to users and, based on what features they chose to enable/disable for the site, the browser would selectively render them.

[+] maxerickson|10 years ago|reply
Users don't want to do this though.

I mean, how many people are dealing with the hassle of noscript? That's probably most of the users that are going to do anything other than tell the browser to stop asking questions.

[+] kardos|10 years ago|reply
So given this information, how can we poison the results that the trackers get?
[+] Freak_NL|10 years ago|reply
Just altering your own browser's fingerprint for each domain won't poison their data (it just makes you anonymous to them). Any data is good data as far as these trackers are concerned. You can devalue their data by collectively sending the same fingerprints, but there is no way to actively poison their databases.
[+] hellweaver666|10 years ago|reply
You would need a way to change your fingerprint on every page load I guess?
[+] codedokode|10 years ago|reply
Some methods of fingerprinting are probably used to distinct between real users and bots. Bots can use patched headless browsers that are masquaraded as desktop browsers (for example as latest Firefox or Chrome running on Windows). Subtle differences in font rendering or missing audio support can be useful to detect underlying libraries and platform. Hashing is used to hide exact matching algorithm from scammers.

There is a lot of people trying to earn on clicking ads with bots.

Edit: and by the way disabling JS is an effective method against most of the fingerprinting techniques.

[+] dsl|10 years ago|reply
As someone who has written code to detect bots, exactly this. We don't care about fingerprinting the user, we care about fingerprinting to verify the user agent you claim to be.
[+] wodenokoto|10 years ago|reply
What annoys me the most is how many useless cycles these trackers use to track me.
[+] MichaelGG|10 years ago|reply
WebRTC guys get around this by stating fingerprinting is game over, so don't even bother. They ignore that they are going against the explicitly defined networking (proxy) settings. Browsers are complicit in this. If the application asks "should I use a proxy", then ignores it, silently, wherever it wants, that's deceptive and broken.

There's still zero (0) use cases to have WebRTC data channels enabled in the background with no indicator.

If all these APIs are added, the web will turn into a bigger mess than it is. They can't prompt for permissions too much. So they'll skip that, like WebRTC does.

[+] ape4|10 years ago|reply
Seems like browsers should ask the user's permission to use these html5 features. Then whitelist. For example, a site that does nothing with audio should be denied access to the audio stack.
[+] pjc50|10 years ago|reply
I think it's time for HTML--, which would contain no active content at all and simply be a reflowable document display format.
[+] aub3bhat|10 years ago|reply
There is an acceptable tradeoff between pseudo anonymous access through browsers vs non-anonymous access through native apps.

To interpret this research as reason for crippling web or browsers would be a giant mistake. Crippling browsers will only work against users, who will be then forced into installing apps by companies.

Two popular shopping companies in India exactly did this, they completely abandoned their websites and went native app only. This combined with large set of permission requested by apps lead to worse experience in terms of privacy for consumers. As the announcement for Instant Apps at Google I/O demonstrate, web as an open platform is in peril and its demise will be only hastened by blindly adopting these types of recommendations.

Essentially web as open platform will be destroyed in the name of perfect privacy. Only to be replaced by inescapable walled gardens. Rather consider that web allows a motivated user to employ evasion tactics, while still offering usability to those who are not interested in privacy. While with native apps where Apple needs a credit card on file to install, offer no such opportunity.

I am happy that Arvind (author of the paper) in another comment recommends a similar approach:

""" Personally I think there are so many of these APIs that for the browser to try to prevent the ability to fingerprint is putting the genie back in the bottle. But there is one powerful step browsers can take: put stronger privacy protections into private browsing mode, even at the expense of some functionality. Firefox has taken steps in this direction https://blog.mozilla.org/blog/2015/11/03/firefox-now-offers-.... Traditionally all browsers viewed private browsing mode as protecting against local adversaries and not trackers / network adversaries, and in my opinion this was a mistake. """

https://news.ycombinator.com/item?id=11730373

[+] crdb|10 years ago|reply
> Two popular shopping companies in India exactly did this, they completely abandoned their websites and went native app only. This combined with large set of permission requested by apps lead to worse experience in terms of privacy for consumers.

I'm surprised nobody has commented on your comment yet. I was in a meeting just this morning where my interlocutor assured me that over 70% of advertising in 10 years will be native apps since everything else is getting blocked or abandoned (and presenting it as an opportunity to do all the stuff you "can't do anymore" on browser).

[+] makecheck|10 years ago|reply
Over 3,000 top sites using the font technique, and from the description this sounds really wasteful (choosing and drawing in a variety of fonts for no reason other than to sniff out the user).

Each font is probably associated with a non-trivial caching scheme and other OS resources, not to mention the use of anti-aliasing in rendering, etc. So a web page, doing something you don’t even want, is able to cause the OS to devote maybe 100x more resources to fonts than it otherwise would?

A simple solution would be to set a hard limit, such as “4 fonts maximum”, for any web site; and, to completely disallow linked domains from using more.

[+] cdnsteve|10 years ago|reply
After reading this it makes me want to disable JavaScript entirely, along with cookies, and go back to text browsing. I've been using Ghostery on my phone, it's been pretty good.
[+] wyldfire|10 years ago|reply
Whoa, what's the use case for exposing battery information?