top | item 24480239

Tracking users via CSS

203 points| lawik | 5 years ago |underjord.io | reply

116 comments

order
[+] TomGullen|5 years ago|reply
What extra data can you track using this method over just normal HTTP logging?

The only things I can think of are how a user interacts with a page - I don't particularly think this is too concerning - although as with all these things there are possibly much more creative uses of it that I haven't considered.

There's a new image property loading="lazy" which generally will load an image when it approaches the viewport. This could also be "abused" in similar ways.

If this does turn out to be a privacy concern, browser settings/privacy addons could simply load all lazy images or images refered to in CSS/JS files on load which would nullify this technique.

[+] lawik|5 years ago|reply
You can mostly track the interactions. Just the fact that CSS was loaded distinguishes the user from a lot of automated tracking. You should be able to track time spent on the page with CSS animations as well, up to a point, that's mentioned in the post.

I don't think you can do anything particularly nasty even with CSS variable programming which can apparently be used for interactive games (https://github.com/propjockey/css-sweeper). As I looked into things writing for this post I couldn't come up with a non-JS way to transition much significant data into CSS.

[+] jedimastert|5 years ago|reply
There are probably quite a few fingerprinting surfaces you could pull out of CSS. For example (these are educated guesses):

* Track what fonts a user has installed by asking for preinstalled fonts and using loaded fonts as a fallback

* Track screen width and height by conditionally loading an image

* Track screen height vs window height (linked to OS and various user settings)

[+] earthboundkid|5 years ago|reply
Firefox won't do lazy loading if JS is disabled, because it doesn't want lazy loading to work as a fallback page depth tracker. If JS is enabled, all bets are off, so there's not much point in using lazy loading instead of just polling/intersection observer.
[+] myfonj|5 years ago|reply
Generally yes, if you stick just to CSS you can log HTTP request initiated by user's interaction or user's environment. To name some environmental: locally installed fonts, viewport dimensions, colour depth, resolution etc. Interaction is anything pointer position related, focus management, and to some degree you can do simple kind of "keylogging" or text character presence detection.

https://github.com/jbtronics/CrookedStyleSheets https://news.ycombinator.com/item?id=16157773

[+] eloff|5 years ago|reply
Almost nobody turns off JavaScript because it breaks the whole web. You can't even view YouTube.

This is less interesting than existing JavaScript techniques to identify or filter out crawlers. It works, when the image is not cached, but it's fundamentally inferior to anything you can do with JavaScript, and if you don't have JavaScript at all, I don't see why you would want to care. Just lump that little bit of traffic in with the bots for analytics purposes.

[+] dangerface|5 years ago|reply
You can use CSS to identify fonts and feature detection like Modernizr to uniquely identify browsers and users. Track clicks, mouse over, language, screen size, time spent can be calculated with animations, pretty much anything you would do with javascript except canvas fingerprinting.

The big advantage of this is that you can still track and uniquely identify people even if they turn off cookies and javascript.

[+] dwild|5 years ago|reply
> The only things I can think of are how a user interacts with a page - I don't particularly think this is too concerning

Except that many people don't want others to know how much they read something... I don't care, but many believe that is private.

> If this does turn out to be a privacy concern, browser settings/privacy addons could simply load all lazy images or images refered to in CSS/JS files on load which would nullify this technique.

So you load more invisible pixel and make this even more effective as now you'll be able to get much more granular data, like the scroll position!

[+] thehappypm|5 years ago|reply
By “normal HTTP logging” what do you mean exactly? Most logging is done with JavaScript, and this is JS-free, would even work in an environment with JS disabled.
[+] sloshnmosh|5 years ago|reply
You can disable lazy loading images in about:config in Firefox
[+] achairapart|5 years ago|reply
Ad-blockers have been around for years and I have always wondered why the Ad industry has not moved to server-side yet. Then I realized that maybe they weren't hit at all by this, but actually that helped with that 50% budget they often claim go to waste in advertising.

Much like Nigeriam Scams self-select the most naive users with their silly stories, some advertisers may likely get better impression/click ratios once the savvy users are out of their game.

[+] thehappypm|5 years ago|reply
The ad industry has overwhelmingly moved to server side. Google AdWords, Facebook ads, Instagram ads, sponsored search results on travel sites. Server side means you need more trust that it’s an actual user and not a server farm, so it’s less usable outside the walled gardens.
[+] bweitzman|5 years ago|reply
Server side tracking would mean that ad companies don't have access to cookies + user fingerprints, so they would be less effective at serving targeted ads.
[+] onion2k|5 years ago|reply
It depends what "this" refers to.

If it's CSS, then no.

If it's loading an external image, then also no. Certainly no more evil than any other method of getting a user's browser to make an HTTP request anyway.

If it's tracking users then maybe. Gathering data is evil unless you have a very good reason. If you're gathering it and not actually using it then that is definitely evil. If you're gathering everything in case you need it then that is also evil. If you're gathering data that's unique to individuals that's even worse. If you're gathering data that's unique to individuals, and keeping it, and using it to build up profiles by blending it with other sources, and then selling the information that's really evil.

Just gathering browser agent strings or screen resolutions though, it's not terrible. Although I do wonder why you need CSS analytics rather than just using the server log from the request for the HTML file.

[+] bnegreve|5 years ago|reply
> I do wonder why you need CSS analytics rather than just using the server log from the request for the HTML file.

From the article:

Lots of automated traffic on the web, bots, crawlers and scrapers. So if there is a way that can remove most of the automated traffic without loading any JS, is that a win?

   body:hover {
        background-image: url("https://underjord.io/you-was-tracked.png");
    }
[...] This has a certain elegance because it actually requires mouse interaction.
[+] Viliam1234|5 years ago|reply
The evil is in how you use the data.

And of course, if you share the data with a third party (that includes if you use their services to collect or process the data), you should logically assume the worst.

I would probably be okay to share all kinds of data with the websites I use, if I could reasonably trust that it will never be used to identify me personally. If you have a website, and you are curious whether your audience is mostly male or female, young or old, using computers or smartphones, I can understand the need, and would not object against my presence increasing some number in the database by one.

Well, there is this problem that if you collect too many attributes about me, and if they are all connected together (as opposed to merely increasing a few independent counters), a sufficiently large collection can be used in the future to identify me uniquely, which is the part I object against. And I have no control about how you store the information, and it is reasonable for me to assume the worst.

To make an analogy with the offline world, if you e.g. give a public lecture, you can see what kind of people are in the audience. And I would feel no desire to wear a mask for the purpose of hiding my age or gender or race or whatever from the lecturer.

But I would object against someone taking my photo, using it to identify me, and writing the information of me attending given lecture on given day into some huge dossier about me, that he would later share with other shady people, so that they can all have my incredibly detailed biography (in case of Google, including also most of my private correspondence). That is definitely evil.

[+] mcv|5 years ago|reply
If you gather data but you anonymise it, then you're totally fine. If you gather it in order to later use it to improve the UX on your site, that's fine.

It's making it personally identifiable and linking it to other things where you get in the really evil territory.

[+] korijn|5 years ago|reply
I am inclined to compare CSS to guns now. At what point is the tool evil and when should it be banned? When all it's users are evil? More than half? Is the tool never evil? Ethics...
[+] sriku|5 years ago|reply
The biggest loss of privacy (my personal view) is that we've lost the ability to read without being observed and that's important to maintain a healthy and diverse mindset in society. It enables people to read without fear of "persecution". That pretty puts much of the "analytical web" in the "evil" basket for me.

edit: autocorrect wrote "prosecution" instead of "persecution". Fixed it.

[+] bostonvaulter2|5 years ago|reply
This is a good point. One potential workaround for this is to implement something usenet style where a whole corpus would be downloaded to your device and then you'd just load all the data locally. Of course only a small fraction of content would be available this way which has it's own set of biases.
[+] feralimal|5 years ago|reply
I like this sort of question, albeit it is akin to preparing to re-tune your violin, in order to do some fiddling while Rome is burning, given just how far we have lost all privacy.

In an ideal world, it should be up to the user what they want to disclose. So, perhaps there should be no logging at all. And having loaded a page, the page should work 'offline' with no further interaction with the page or site by default. I mean, that's how simple sites appear to work. That they don't work like how you think they appear to, illustrates how technologists are selling illusions for profit.

[+] ComodoHacker|5 years ago|reply
In an ideal world, users would also disclose some behavioral data to help webmasters get some meaningful feedback about their work.

In our world however, almost all users won't care, and the rest few won't disclose anything out of suspicion of abusing that data.

[+] wegs|5 years ago|reply
I find this not evil. If this is covertly used, that'd be evil.

I agree with the argument "If you do it to extract information from your user to which they would not consent, it’s evil."

However, we tend to get caught up in the right-now and not think through consequences. If this were widely used, browsers would implement the same sorts of privacy controls they do around 3rd party cookies, JS, etc.

This seems like a more semantic way to do tracking than many other techniques. It seems like it'd be easier for browsers to manage.

[+] jiofih|5 years ago|reply
The evil part in tracking is tracking user behaviour and personally identifying data. If you’re tracking overall metrics anonymously, it doesn’t really matter if it’s done via HTML/JS/CSS, it is probably not evil.
[+] surround|5 years ago|reply
The EFF’s own website has analytics, somewhat ironically. But the information they collect is limited, and the analytics are loaded from a separate domain (anon-stats.eff.org) so it’s easy to block. EFF’s privacy policy:

https://www.eff.org/policy

I think that first-party analytics are kind of a gray area. Third-party analytics are always evil.

[+] lawik|5 years ago|reply
Is it that clear-cut? Is a nice friendly org that considers the options and picks something minimal and ostensibly ethical like Fathom or Plausible and then just uses that to keep track of how they are doing evil? Or is that not a third party, just first-party outsourced? Not sure what definition we're working with here :)
[+] _qulr|5 years ago|reply
You can use the same technique with a:active to track link clicks, by the way.

Technically, this would all be relatively easy to block with your own user style sheet. Practically, though, a lot of non-tracking sites rely on background-image for essential functionality, so you'll see a lot of breakage. It's a dilemma.

[+] huhtenberg|5 years ago|reply
Very clever, borderline ingenious.

The task of filtering out bots from server logs can get really tedious even if there's JS involved. Being able to spot humans using this technique is really quite helpful.

Edit - body:hover doesn't seem to work in Firefox, but it's trivial to work around that.

[+] rdevsrex|5 years ago|reply
I toyed around with pixel tracking like this before with PHP and the GD library. You just create an 1x1 white pixel and set the file extension to .png or whatever, and as long as you've configured the server right, it will execute and return a pixel. But then you can do all the other tracking you want. And the user doesn't know any different.

That said I won't use that in the future but it's scary how easy it is.

[+] weego|5 years ago|reply
This is how certain large newspaper sites do their user tracking so editors can improve their article's engagement 'live'
[+] sildur|5 years ago|reply
If I were a browser engine I would download all the assets, all the images, whether the user hovered or focused or interacted in any way or not.
[+] jfk13|5 years ago|reply
Users on metered connections might not thank you.
[+] hunter2_|5 years ago|reply
But power management (especially mobile devices)... Nobody likes excess heat and wasting electricity. And metered data connections. Lots of competing forces here.
[+] Angeo34|5 years ago|reply
Font and client dimension fingerprinting are the reasons why people should stop thinking Brave actually protects them from anything. Brendan we both know it's impossible to solve using a Chromium base you being bitter against Mozilla is a different story nobody cares about.

Don't take your personal grudge out on your users by fooling them into a false sense of security Brendan.

[+] can16358p|5 years ago|reply
It really depends on the purpose IMO. If you are cross matching that data from other sources to track, it is "evil" (in the sense of people describe tracking people, I personally don't care). If you are using it for your own statistics (how many people visited where, screen sizes, duration, where they scrolled at etc.) I don't see any issue there.
[+] neallindsay|5 years ago|reply
This technique could be used for good (detect an automated harassment campaign) or evil (unmask a protester agitating for societal change against a powerful state).

As technologists we want to be able to look at a technology and discern if it is good or evil. Unfortunately we don't always have enough information.

[+] TomGullen|5 years ago|reply
> or evil (unmask a protester agitating for societal change against a powerful state)

Can you give an example of how this technique could be exploited in such a way?

[+] xlii|5 years ago|reply
I find it lesser evil. On one side it’s hidden analytics but on the other hand I find it much more superior to cookies that I carry around and which a lot of different entities can track.

I do find way of voting on this matter very interesting. Parsing the logs to get the results - how amusingly nerdy!

[+] athenot|5 years ago|reply
> Parsing the logs to get the results - how amusingly nerdy!

This was a big use-case for the Practical Extraction and Reporting Language 2 decades ago... :)

Today with decent JSON logs, it's also quite fun.

[+] extremeMath|5 years ago|reply
I'm a believer that these kind of issues should be solved at the consumers end.

I haven't built a web browser, but I built a bot and it's somewhat doable to avoid getting tracked.

A browser could feed a fake user agent and format the browser to be the correct size. After that I believe it's only IP address and cookies which are easy enough to be blocked.

It even defeats the CSS tracking mentioned. "Oh someone downloaded image 6374tracker.png, but they were from UAE and are using Firefox" and are never seen again.

My only weakness on this subject is the low level headers, anyone familiar?

[+] bdcravens|5 years ago|reply
It has been best practice for some time now to detect not based on user agent, but by features. (plenty still use the UA approach of course)
[+] social_quotient|5 years ago|reply
What low level headers are you thinking of here?
[+] ChrisMarshallNY|5 years ago|reply
That's not a bad idea. I suspect that the author is not the first to come up with it. It's better than a heatmap.

Like most tools, it is up to the user, as to whether or not it's "evil."

I'm reminded of that rather silly little speech at the beginning of Dark Phoenix, where Xavier lectures Jean Gray about the uses of a pen.

If I were trying to understand users in something like A/B testing, I might use the technique, but I'd probably only do so temporarily. I'd need to make sure that the practice was outlined in the privacy policy.

[+] raxxorrax|5 years ago|reply
I clicked "This is evil" although I think that is just partially correct. The problem with tracking is that it exploits functionality that wasn't intended to identify users.

In context of reality it is very nice that the author even ponders about it. This is already less evil than what we can expect on the "modern" web, however nefarious and tricky the mechanism might be. But loading a resource for tracking IPs isn't really intrinsically evil.