In addition, having Expires set to a date in the past in not the same as "Cache-Control: no-cache, private". The latter instructs CDNs not to cache the file whereas the former doesn't (CDN is allowed to cache the file and revalidates with the origin).
Your use of the term "attack" seems to imply that a malicious client can trigger circular request loops by using a cleverly forged request. But I cannot understand how it could happen, unless the proxy servers are misconfigured. Am I missing something?
What a terrible stance for a company like Fastly to take:
More debatable perhaps is Via, which is required (by RFC7230) to be added to the response by any proxy through which it passes to identify the proxy. This can be something useful like the proxy’s hostname, but is more likely to be a generic identifier like “vegur”, “varnish”, or “squid”. Removing (or not setting) this header is technically a spec violation, but no browsers do anything with it, so it’s reasonably safe to get rid of it if you want to.
Actually, it isn’t “debatable,” since the debate occurred, and a decision was made, and published. That’s what RFCs are for.
To ignore them with such wanton disregard speaks volumes.
Edit: to clarify, I didn't mean that RFCs should not be debated at all, only that disregarding this because "no browsers do anything with it" didn't seem like a good justification or stance.
Very interesting link, thanks. I'm not too familiar with this area, but from my understanding of the article, Cloudflare are suggesting that all players in the game need to be compliant otherwise nobody wins
So is this Fastly article suggesting a different point of view?
The article mentions that Via is useful while the request is bouncing around among proxies, but isn't useful in responses, which is what the article is about.
They're talking about responses in which Via is technically 'required' but pretty useless. The blog post you linked seems to be about the use of the header in requests.
Saying that a header is useless because it has been deprecated and displaced by a newer header is... misleading at best.
If all you ever code for is the latest version of Firefox and Chrome, you might not understand this, but there's a whole world out there with an astonishing diversity of browsers. (Also, your site is bad and you should feel bad.) Removing X-Frame-Options without first checking if 99.99% of your users' browsers support Content-Security-Policy is just asking for increased risk.
Most of the suggestions in this post are great, but as always, especially when security is involved, you need to assess your business needs yourself.
The suggestion to use Content-Security-Policy over X-Frame-Options is great -- if you don't expect many of your users to be using IE-based browsers. If you're primarily serving large enterprises or government customers though, it's likely that most of your users will still be coming from a browser that doesn't support Content-Security-Policy.
P3P is unnecessary until you have clients complaining that Internet Explorer users cannot use the site and it's hurting their business. I speak of experience.
Curiously enough, P3P enforcement depends on the operating system and not on the browser. Internet Explorer 11 may or may not care about P3P depending if you're on Windows 7 or Windows 10.
Came here to say the exact same thing. P3P may be "officially" obsolete, but if your business wants older browsers to be able to handle your code, you're going to have to deal with it.
If you have the misfortune of encountering it, you can get really hard to detect bugs with ajax calls or script files not getting loaded in IE when you don't have P3P set up correctly. (for instance: https://www.techrepublic.com/blog/software-engineer/craft-a-...)
cache-control doesn't completely replace Expires for some use cases.
If you have a scheduled task that generates data every hour, you can set Expires accordingly so all clients will refresh the data as soon as the hour rolls over.
You can do this using max-age but then you have to dynamically calculate this header per request which means you can't do things like upload your data to s3 and set the cache-control header on it.
With expires, I can upload a file to s3 and set
Expires: ... 17:00
and then not have to touch it again for an hour.
you can work around this client side with per hour filenames or the other usual cache busting tricks, but that's annoying.
I get your point but it's such a niche use case that I can't see it coming up in real world situations. I mean, "never say never", but it's a solution that creates as many problems as it solves.
I used to build online games that fed off real world events. Eg football managers based on real football matches, games based on horse racing, F1, tour de France, and many others. We needed to change feeds when the match started and ended, but sometimes events are delayed or run into extra time. So we needed a way to change that quickly. We also needed to present different screens at the start and end of the event to the live scoring during the event. This all meant it was easier handling times based cut offs in JavaScript with the live scoring JSON files (which were being fed from S3) using cache control header because it was easier to set an X seconds into he future time out for that than rewriting the S3 tags every few seconds with a new expires header.
On paper our use case should be precisely what you described but even we found expires to be unnecessary.
It seems like kind of an unlikely scenario that you'd want to expire content at a specific time. I mean, if someone chooses to do that, they better know what the impact could be.
With the Expires header, all clients that retrieved that content would expire at the exact same time, which could cause some disproportionately high load in the few seconds after that (the "thundering herd" problem). The Cache-Control solution will stagger the expirations (relative to when the client last retrieved it) so the server doesn't get trampled.
I really wish the browser vendors would come together to establish a plan to clean up User-Agent. It's one of the worst offenders in header legacy[1] and fingerprinting. Exposing what browser I am using and it's major version is fine but I don't think every website I visit deserves to know what OS I am using, nor the details of my CPU.
These days the referrer header rarely makes it through for 2 main classes of reasons [0].
1. Requests transiting across HTTP <-> HTTPS boundaries do not include the referrer header.
2. The referrer header is frequently disabled by sites (especially search engines and high-traffic sites) through the use a special HTML header meta control tag [1]:
<meta name="referrer" content="no-referrer" />
Worry not, though. When client-side Javascript is enabled, ga.js still sends enough information that Google can reconstruct most of everyone's browsing sessions on their backend. Now Google (and only Google) really has all your / our data (generally speaking). :-\
I used to spoof my user-agent and don't remember much of a difference... As a dev, everyone tells me I should just throw literally every possible version of newer attributes into the CSS anyhow, so on most websites you're bound to get at least some of the right ones.
Perhaps your complaint is of a higher order though? Recently I've been spending most of my time wrestling with CSS so my perspective is a bit skewed...
for instance, just found today that GitHub code reviews require the Referer header to allow PR comments. Without the Referer header, GH returns `422 Unprocessable Entity`
server is no vanity, server is needed to know WHO THE HELL responded you (we are in a very messy cdn selectors + cdns + application layers depending on non obvious rules on (sub)domain and cookies).
Speaking of HTTP headers. One I wish more people would use is Accept-Language instead of region/geoip based localization. Practically every site I've come across ignores this header in favour of geoip with the weird and notable exception of Microsoft exchange webmail and Grafana.
I get that this is data that Fastly has to send but doesn’t get to bill directly to customers, but don’t expect ME to care about this until the average news article stops sending me 10 MB.
I wouldn't trust this entry at all. The author did not do proper research to understand the why's behind the headers that he didn't understand or didn't know well enough.
They list "date" as being required by protocol. This is not true. The term used in the RFC is "should". It is a nice to have, for additional validation by proxies.
The term the RFC (RFC 2616, Section 14.18) uses is "MUST" with 3 exceptions (HTTP 100/101 responses, which are message-less; HTTP 500-class errors which are indications that the server is malfunctioning and during this malfunction it's inconvenient to generate a date; and finally HTTP servers without clocks), which are all referencing exceptional cases -- in general HTTP/1.1 responses MUST include a Date header from the Origin server, and proxies MUST add the Date header if the Origin server failed to do so (due to 1 or more of the 3 exceptions).
Oh God. No. Expires and Pragma are absolutely essential if you're writing a web app to be used by folks stuck behind a walled garden proxy implemented in the dumbest way possible.
It would be helpful to have a guide to this for people running a 'low audience website' where there is no CDN or Varnish, just some Apache or Nginx server on a slow-ish but cheap VPS.
For a local business or community, e.g. an arts group with a Wordpress style site, there are many common problems, they might not need a full CDN, just serving media files from a cookieless subdomain gets their site up to acceptable speed cutting the header overhead considerably.
Purging the useless headers might also include getting rid of pointless 'meta keywords' and what not.
The tips given here could be really suited to this type of simple work to get a site vaguely performant. How to do it with common little guy server setups could really help.
The details are interesting but "adds overhead at a critical time in the loading of your page" ... this seems pretty unlikely to have any noticeable processing overhead. Doing things better is generally good, but this all seems very low impact.
Depends on where you measure it. A client on a decent connection will never notice. If you're serving billions of hits, 20 bytes in a header is something you will definitely notice on your bandwidth bill.
I got stuck with a website once that was using one of the compression headers - maybe content-encoding to indicate that it's .gz files were gzipped even if the client didn't indicate it supported it. Some browsers would ignore it and just download the file, but others would unzip it. So you got a different file depending on what browser you used! I think wget and chrome behaved differently from each other. I wrote to the site operator who corrected it.
[+] [-] buro9|7 years ago|reply
A very cheap attack is to chain CDNs into a nice circle. This is what Via protects against: https://blog.cloudflare.com/preventing-malicious-request-loo...
Just because a browser doesn't use a header does not make the header superfluous.
[+] [-] khc|7 years ago|reply
Disclosure: I work at cloudflare.
[+] [-] yongjik|7 years ago|reply
[+] [-] randomdrake|7 years ago|reply
More debatable perhaps is Via, which is required (by RFC7230) to be added to the response by any proxy through which it passes to identify the proxy. This can be something useful like the proxy’s hostname, but is more likely to be a generic identifier like “vegur”, “varnish”, or “squid”. Removing (or not setting) this header is technically a spec violation, but no browsers do anything with it, so it’s reasonably safe to get rid of it if you want to.
Actually, it isn’t “debatable,” since the debate occurred, and a decision was made, and published. That’s what RFCs are for.
To ignore them with such wanton disregard speaks volumes.
Edit: to clarify, I didn't mean that RFCs should not be debated at all, only that disregarding this because "no browsers do anything with it" didn't seem like a good justification or stance.
[+] [-] buro9|7 years ago|reply
[+] [-] djhworld|7 years ago|reply
So is this Fastly article suggesting a different point of view?
[+] [-] kelnos|7 years ago|reply
[+] [-] pvg|7 years ago|reply
[+] [-] voidlogic|7 years ago|reply
[+] [-] lxe|7 years ago|reply
[+] [-] phyzome|7 years ago|reply
If all you ever code for is the latest version of Firefox and Chrome, you might not understand this, but there's a whole world out there with an astonishing diversity of browsers. (Also, your site is bad and you should feel bad.) Removing X-Frame-Options without first checking if 99.99% of your users' browsers support Content-Security-Policy is just asking for increased risk.
[+] [-] organicmultiloc|7 years ago|reply
[deleted]
[+] [-] ShaneWilton|7 years ago|reply
The suggestion to use Content-Security-Policy over X-Frame-Options is great -- if you don't expect many of your users to be using IE-based browsers. If you're primarily serving large enterprises or government customers though, it's likely that most of your users will still be coming from a browser that doesn't support Content-Security-Policy.
[+] [-] Hamuko|7 years ago|reply
Curiously enough, P3P enforcement depends on the operating system and not on the browser. Internet Explorer 11 may or may not care about P3P depending if you're on Windows 7 or Windows 10.
[+] [-] pfarrell|7 years ago|reply
If you have the misfortune of encountering it, you can get really hard to detect bugs with ajax calls or script files not getting loaded in IE when you don't have P3P set up correctly. (for instance: https://www.techrepublic.com/blog/software-engineer/craft-a-...)
[+] [-] justinsaccount|7 years ago|reply
If you have a scheduled task that generates data every hour, you can set Expires accordingly so all clients will refresh the data as soon as the hour rolls over.
You can do this using max-age but then you have to dynamically calculate this header per request which means you can't do things like upload your data to s3 and set the cache-control header on it.
With expires, I can upload a file to s3 and set
and then not have to touch it again for an hour.you can work around this client side with per hour filenames or the other usual cache busting tricks, but that's annoying.
[+] [-] laumars|7 years ago|reply
I used to build online games that fed off real world events. Eg football managers based on real football matches, games based on horse racing, F1, tour de France, and many others. We needed to change feeds when the match started and ended, but sometimes events are delayed or run into extra time. So we needed a way to change that quickly. We also needed to present different screens at the start and end of the event to the live scoring during the event. This all meant it was easier handling times based cut offs in JavaScript with the live scoring JSON files (which were being fed from S3) using cache control header because it was easier to set an X seconds into he future time out for that than rewriting the S3 tags every few seconds with a new expires header.
On paper our use case should be precisely what you described but even we found expires to be unnecessary.
[+] [-] AbacusAvenger|7 years ago|reply
With the Expires header, all clients that retrieved that content would expire at the exact same time, which could cause some disproportionately high load in the few seconds after that (the "thundering herd" problem). The Cache-Control solution will stagger the expirations (relative to when the client last retrieved it) so the server doesn't get trampled.
[+] [-] daxterspeed|7 years ago|reply
[1] https://www.nczonline.net/blog/2010/01/12/history-of-the-use... (2010, though little has changed since then).
[+] [-] gcp|7 years ago|reply
I'm sure there's a Bugzilla bug about the "X11; Linux x86_64" in the headers, and I'd be terrified to open it.
[+] [-] dewiz|7 years ago|reply
[+] [-] gtirloni|7 years ago|reply
This is an amusing (scary?) article about the history of the user-agent:
https://webaim.org/blog/user-agent-string-history/
[+] [-] jaytaylor|7 years ago|reply
https://jaytaylor.com/writeups/2018/why-referrer-header-empt...
--
Short version:
These days the referrer header rarely makes it through for 2 main classes of reasons [0].
1. Requests transiting across HTTP <-> HTTPS boundaries do not include the referrer header.
2. The referrer header is frequently disabled by sites (especially search engines and high-traffic sites) through the use a special HTML header meta control tag [1]:
Worry not, though. When client-side Javascript is enabled, ga.js still sends enough information that Google can reconstruct most of everyone's browsing sessions on their backend. Now Google (and only Google) really has all your / our data (generally speaking). :-\[1] https://stackoverflow.com/questions/6880659/in-what-cases-wi...
[0] https://stackoverflow.com/questions/6817595/remove-http-refe...
[+] [-] gboudrias|7 years ago|reply
Perhaps your complaint is of a higher order though? Recently I've been spending most of my time wrestling with CSS so my perspective is a bit skewed...
[+] [-] realusername|7 years ago|reply
[+] [-] dewiz|7 years ago|reply
[+] [-] _ZeD_|7 years ago|reply
gosh, no.
server is no vanity, server is needed to know WHO THE HELL responded you (we are in a very messy cdn selectors + cdns + application layers depending on non obvious rules on (sub)domain and cookies).
[+] [-] randomstring|7 years ago|reply
So beware of unexpected side-effects!
[+] [-] AstralStorm|7 years ago|reply
[+] [-] dijit|7 years ago|reply
[+] [-] ggg9990|7 years ago|reply
[+] [-] manigandham|7 years ago|reply
[+] [-] Steeeve|7 years ago|reply
[+] [-] LinuxBender|7 years ago|reply
In haproxy, you can discard it with:
[+] [-] rkeene2|7 years ago|reply
[+] [-] Rjevski|7 years ago|reply
[+] [-] torstenvl|7 years ago|reply
[+] [-] sqldba|7 years ago|reply
Step 2: Advise, "This is part of the standard but ignore it because it's pointless."
[+] [-] prashnts|7 years ago|reply
[+] [-] yeukhon|7 years ago|reply
[+] [-] brobinson|7 years ago|reply
This was a requirement to have IE6 accept third party cookies from your site.
[+] [-] Theodores|7 years ago|reply
For a local business or community, e.g. an arts group with a Wordpress style site, there are many common problems, they might not need a full CDN, just serving media files from a cookieless subdomain gets their site up to acceptable speed cutting the header overhead considerably.
Purging the useless headers might also include getting rid of pointless 'meta keywords' and what not.
The tips given here could be really suited to this type of simple work to get a site vaguely performant. How to do it with common little guy server setups could really help.
[+] [-] nebulous1|7 years ago|reply
[+] [-] __jal|7 years ago|reply
[+] [-] lopmotr|7 years ago|reply