So this is the old E-tag trick? It's not new at all, but perhaps it hasn't been as widely published. This can also be done with the Date / If-Last-Modified header, manufacturing a cookie "date".
I never claimed it was new :). But I never heard about it until recently, while it works frighteningly similar to cookies and I generally know most common things related to website security.
In Europe there are laws that restrict the usage of cookies (opt-out in some countries, opt-in in others (like the Netherlands)). This website proves how stupid that is. Letting politicians mess with technology is dangerous, at least they should be advised better. I understand everyone is worried about their privacy, but just outlawing cookies is NOT going to fix anything.
Edit: I just looked up the Belgian "Telecommunication law" and article 129 talks about "The storage of information or accessing information that's already stored in the device of a user" [1] (Loose translation). So I guess it's very broad.
i found a uk govt page describing this: "Regulation 6 covers the use of electronic communications networks to store information, eg using cookies, or gain access to information stored in the terminal equipment of a subscriber or user."
which is more general than just cookies and would cover caching too.
Personally I think the world would have been better served if they just put the law on browser makers and get them to provide a clear and consistent interface to whether a web site is tracking the user.
The incognito mode is designed to be incognito from the moment you open the window. When you close it and re-open the window, the cache will be thrown away (along with any cookies, localstorage, etc.). During your incognito session, cache and cookies and everything is stored. If they didn't do this, websites would not function properly.
I believe the incognito mode has more potential than it's currently being used as. For example multiple parallel sessions would be not only nice and handy, if people learned to use it, the isolation would also enhance browser security. As I mentioned on the page, it single-handedly eliminates a number of https attacks and tracking methods.
Edit: Oh you mean that bug in this demonstration? Yes that's a local issue here. In practice the two sessions could not be linked, as you can see when you press f5 in the incognito window. Sorry about that, it's mostly meant as a tech demo where people can understand and learn from the code, not a finished product!
I already knew about this but completely forgot, thank you for refreshing my memory :)
I have this problem with people registering in one forum with multiple identities and I have been fighting it with cookies to track them, but they got smarter and delete the cookies now.. so it's useful to know I can try something new.
I understand the implications and the underhanded nature of this sort of tracking. However, could it prove more efficient that traditional, legitimate analytics? (eg: google analytics via cookies)
For example, could there be less bandwidth consumption using this method vs cookies?
Cookies are absurdly long nowadays, if they were shortened lots of bandwidth could be saved. Especially Google's analytics cookies are long and sometimes even contain referring domains or something. I don't really understand why they do this, but it probably has some use that lessens the burden on their database infrastructure at the cost of user bandwidth.
Also this is more of a hack and sneaky tracking method than a legitimate way of identifying users. Whenever someone's cache is full or gets cleaned, the "cookie" (etag) will be lost.
Can you disable ETag-based caching but allow If-Modified-Since? A quick check of a few popular sites shows that they're using If-Modified-Since instead of ETags, probably for this reason?
While we're on the topic of browser caches, do any browsers let you easily store your cache in RAM instead of disk? Without resorting to other 'hacks' like setting the cache path to a ramdisk, that is.
It's very unlikely that popular sites are choosing not to use ETags in order to protect privacy. If they want to track you, they just use cookies. There is no privacy risk in a site using ETags as intended; it's only when the server abuses them to act as a unique identifies. And you can actually do the same thing with if-modified-since and the date, you just get fewer bits of data to work with.
Instead of clearing the cache, one could stop using If-Match and instead do a HEAD request and see if the ETag is the same. This increases latency by one RTT, if the resource has in fact changed. Also, it could be implemented outside the browser, in a proxy (albeit the proxy won't be standard-compliant and obviously won't work on HTTPS sites).
>One thing I would strongly recommend you to do anytime you visit a page where you want a little more security, is opening a private navigation window and using https exclusively. Doing this single-handedly eliminates attacks like BREACH (the latest https hack).
As far as I'm aware, it would not mitigate BREACH. Can anyone shed any light on why it would?
You can't do arbitrary requests, which is required for that attack (and CRIME) to work. This is normally done by injecting traffic in http pages, but if you are using incognito mode for https exclusively then this can't be done. In fact the same goes for normal browsing mode, but it's so inconvenient to have to close all other tabs just to do a wire transfer or something. And incognito mode has the additional advantage of also disabling tracking cookies and the like. My bank actually uses Google analytics on their website...
Look carefully at the source code. Its bogus. Not the etags trick, but the demo itself.
The demo is actually just identifying users by hashing the REMOTE_ADDR and USER_AGENT, HTTP headers.
So it appear to work, when it doesn't really. Users with dynamic-ip or via proxies etc will often fail.
This is why it appears to work cross incognito windows. Chrome sends the same useragent incognito or not.
----
The etag trick is real. But DO need to use Javascript in the browser to extract the etag from the headers of the cached image. It doesnt really have to be an image. Just a request that can be made via XMLHttpRequest.
... or could set the etag on the page itself, and use the fact that the browser will send a If-None-Match on the next request. But only works for the one single uri, not all pages on the domain. The code appears it COULD be used to do that. But it never sets ETag http header on itself.
Interestingly, popping open a new Firefox private window showed my tracked data, but after closing the window it was all reset (even though I had the tab open in a normal window the whole time). I'm guessing closing the private window erases any 'dirtied' cached files?
I figured this out yesterday while working on a client's site. Apparently FF private mode still uses existing cache. If you had files from the site cached before opening the site in private mode it will still use them.
This seems like a pretty big issue to me. It defeats private mode.
If he can only associate my previous page view with my current requests, then how does he know it's me again? I mean, the image request is separate from the page request, and in any case only comes later.
So he would need to store a mapping from something he already knows (from the headers of my request for his html page, or my IP) to my ETag "cookie" to know what my previous ETag was.
Wouldn't that require using some of the features he wasn't going to use (like user agent) to work?
No E-Tag tracking is taking place, since the E-Tag is never send to the server for the index.php request (only for the image request). In theory he could update the session after your IP changed, but he does not seem to do that (the image requests hold on to the old E-Tag).
So, basically, to me it seems like the whole point/post is invalid. Please correct me if I'm wrong.
As usual, advertisers rely on assumptions about what they think users will or will not do. When users deviate from the assumed patterns, tracking fails.
Three ways to easily defeat this "cookieless tracking" come to mind:
1. Turn off automatic image loading.
2. Use your HOSTS file to block/redirect the domain name to which the tracking info is sent.
3. On devices that hide the HOSTS file, use your own localhost DNS server to block/redirect the domain name to which tracking info is sent.
The common theme here is the user takes more control over what connections her computer may initiate.
Under current usage patterns a user types a domain name in an address bar of a browser (usually a browser written by some entity that pays its developers through revenues from the sale of advertising) or she types something into a search bar/box and then selects a search result. The user thereby initiates a connection to some other computer addressed by a. the domain name she types (assuming she types the name correctly; otherwise she may end up at a page of sponsored search results) or b. the result she selects.
This level of navigation is within the user's control. She intends to connect to a computer addressed by a domain name that she can type or select. Does she also intend to connect to other unspecified computers at the same time?
Due to the way these browsers are configured, many more connections to other unspecified computers may be initiated without any input from the user. Increasingly, these are computers that serve the user no useful content. They are devoted to tracking. Go figure.
Does the user want her computer to connect to other unspecified computers whose sole purpose is to track her? Under current assumptions, this is to be decided outside of the user's control (and awareness).
By exercising more control over what browsers do and over domain name lookups, the user can retain more power to specifically choose the other computers to which her computer connects.
[+] [-] 0x0|12 years ago|reply
[+] [-] nikcub|12 years ago|reply
http://www.nikcub.com/posts/persistant-and-unblockable-cooki...
As an update to that post from 2011, I never did get the browsers to update their parsers.
This E-Tag bug has been known for over a decade. As part of my post I tried to find the earliest written record of it and found a post in 2003:
http://www.arctic.org/~dean/tracking-without-cookies.html
It seems to get 'rediscovered' at least a couple of times a year.
[+] [-] lucb1e|12 years ago|reply
[+] [-] bostik|12 years ago|reply
Zalewski's "I know which sites you visit" asteroids clone is probably from the funkier end: http://lcamtuf.blogspot.fi/2013/05/some-harmless-old-fashion...
[+] [-] chancancode|12 years ago|reply
[1] http://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags
[+] [-] TheHydroImpulse|12 years ago|reply
[+] [-] tzury|12 years ago|reply
http://samy.pl/evercookie/
[+] [-] AlexanderDhoore|12 years ago|reply
Edit: I just looked up the Belgian "Telecommunication law" and article 129 talks about "The storage of information or accessing information that's already stored in the device of a user" [1] (Loose translation). So I guess it's very broad.
[1] http://bit.ly/18DLvov
[+] [-] andrewcooke|12 years ago|reply
which is more general than just cookies and would cover caching too.
http://www.ico.org.uk/for_organisations/privacy_and_electron...
[+] [-] martin-adams|12 years ago|reply
"When considering alternatives to cookies it is important to look at the broader privacy context. Focusing solely on cookies is missing the point."
Source (page 25): http://www.ico.org.uk/for_organisations/privacy_and_electron...
Personally I think the world would have been better served if they just put the law on browser makers and get them to provide a clear and consistent interface to whether a web site is tracking the user.
[+] [-] sker|12 years ago|reply
Edit: not really, the article explains it but I hadn't finish reading it.
[+] [-] lucb1e|12 years ago|reply
I believe the incognito mode has more potential than it's currently being used as. For example multiple parallel sessions would be not only nice and handy, if people learned to use it, the isolation would also enhance browser security. As I mentioned on the page, it single-handedly eliminates a number of https attacks and tracking methods.
Edit: Oh you mean that bug in this demonstration? Yes that's a local issue here. In practice the two sessions could not be linked, as you can see when you press f5 in the incognito window. Sorry about that, it's mostly meant as a tech demo where people can understand and learn from the code, not a finished product!
[+] [-] dalek_cannes|12 years ago|reply
[+] [-] k3n|12 years ago|reply
[+] [-] dewiz|12 years ago|reply
[+] [-] hayksaakian|12 years ago|reply
For example, could there be less bandwidth consumption using this method vs cookies?
[+] [-] lucb1e|12 years ago|reply
Also this is more of a hack and sneaky tracking method than a legitimate way of identifying users. Whenever someone's cache is full or gets cleaned, the "cookie" (etag) will be lost.
[+] [-] pktgen|12 years ago|reply
While we're on the topic of browser caches, do any browsers let you easily store your cache in RAM instead of disk? Without resorting to other 'hacks' like setting the cache path to a ramdisk, that is.
[+] [-] eli|12 years ago|reply
[+] [-] lingben|12 years ago|reply
[+] [-] lucb1e|12 years ago|reply
[+] [-] robryk|12 years ago|reply
[+] [-] djm_|12 years ago|reply
As far as I'm aware, it would not mitigate BREACH. Can anyone shed any light on why it would?
[+] [-] lucb1e|12 years ago|reply
[+] [-] barryhunter|12 years ago|reply
The demo is actually just identifying users by hashing the REMOTE_ADDR and USER_AGENT, HTTP headers.
So it appear to work, when it doesn't really. Users with dynamic-ip or via proxies etc will often fail.
This is why it appears to work cross incognito windows. Chrome sends the same useragent incognito or not.
----
The etag trick is real. But DO need to use Javascript in the browser to extract the etag from the headers of the cached image. It doesnt really have to be an image. Just a request that can be made via XMLHttpRequest.
... or could set the etag on the page itself, and use the fact that the browser will send a If-None-Match on the next request. But only works for the one single uri, not all pages on the domain. The code appears it COULD be used to do that. But it never sets ETag http header on itself.
[+] [-] barryhunter|12 years ago|reply
/me wanders off to wipe egg off my face.
[+] [-] tmister|12 years ago|reply
[+] [-] Groxx|12 years ago|reply
[+] [-] driverdan|12 years ago|reply
This seems like a pretty big issue to me. It defeats private mode.
[+] [-] sspiff|12 years ago|reply
So he would need to store a mapping from something he already knows (from the headers of my request for his html page, or my IP) to my ETag "cookie" to know what my previous ETag was.
Wouldn't that require using some of the features he wasn't going to use (like user agent) to work?
What am I missing?
[+] [-] miken123|12 years ago|reply
No E-Tag tracking is taking place, since the E-Tag is never send to the server for the index.php request (only for the image request). In theory he could update the session after your IP changed, but he does not seem to do that (the image requests hold on to the old E-Tag).
So, basically, to me it seems like the whole point/post is invalid. Please correct me if I'm wrong.
[+] [-] sidcool|12 years ago|reply
[+] [-] D9u|12 years ago|reply
Private browsing, as well as connecting through Tor, and the tracking didn't work.
My text is lost on refresh.
Thanks for reminding me.
[+] [-] amenod|12 years ago|reply
edit: shortened and clarified.
[+] [-] gwu78|12 years ago|reply
Three ways to easily defeat this "cookieless tracking" come to mind:
1. Turn off automatic image loading.
2. Use your HOSTS file to block/redirect the domain name to which the tracking info is sent.
3. On devices that hide the HOSTS file, use your own localhost DNS server to block/redirect the domain name to which tracking info is sent.
The common theme here is the user takes more control over what connections her computer may initiate.
Under current usage patterns a user types a domain name in an address bar of a browser (usually a browser written by some entity that pays its developers through revenues from the sale of advertising) or she types something into a search bar/box and then selects a search result. The user thereby initiates a connection to some other computer addressed by a. the domain name she types (assuming she types the name correctly; otherwise she may end up at a page of sponsored search results) or b. the result she selects.
This level of navigation is within the user's control. She intends to connect to a computer addressed by a domain name that she can type or select. Does she also intend to connect to other unspecified computers at the same time?
Due to the way these browsers are configured, many more connections to other unspecified computers may be initiated without any input from the user. Increasingly, these are computers that serve the user no useful content. They are devoted to tracking. Go figure.
Does the user want her computer to connect to other unspecified computers whose sole purpose is to track her? Under current assumptions, this is to be decided outside of the user's control (and awareness).
By exercising more control over what browsers do and over domain name lookups, the user can retain more power to specifically choose the other computers to which her computer connects.
[+] [-] zongitsrinzler|12 years ago|reply