top | item 17074721

HTTP headers we don't want

451 points| kawera | 7 years ago |fastly.com

139 comments

order
[+] buro9|7 years ago|reply
Via is not safe to remove and Fastly know this as well as Akamai, Cloudflare and others.

A very cheap attack is to chain CDNs into a nice circle. This is what Via protects against: https://blog.cloudflare.com/preventing-malicious-request-loo...

Just because a browser doesn't use a header does not make the header superfluous.

[+] khc|7 years ago|reply
In addition, having Expires set to a date in the past in not the same as "Cache-Control: no-cache, private". The latter instructs CDNs not to cache the file whereas the former doesn't (CDN is allowed to cache the file and revalidates with the origin).

Disclosure: I work at cloudflare.

[+] yongjik|7 years ago|reply
Your use of the term "attack" seems to imply that a malicious client can trigger circular request loops by using a cleverly forged request. But I cannot understand how it could happen, unless the proxy servers are misconfigured. Am I missing something?
[+] randomdrake|7 years ago|reply
What a terrible stance for a company like Fastly to take:

More debatable perhaps is Via, which is required (by RFC7230) to be added to the response by any proxy through which it passes to identify the proxy. This can be something useful like the proxy’s hostname, but is more likely to be a generic identifier like “vegur”, “varnish”, or “squid”. Removing (or not setting) this header is technically a spec violation, but no browsers do anything with it, so it’s reasonably safe to get rid of it if you want to.

Actually, it isn’t “debatable,” since the debate occurred, and a decision was made, and published. That’s what RFCs are for.

To ignore them with such wanton disregard speaks volumes.

Edit: to clarify, I didn't mean that RFCs should not be debated at all, only that disregarding this because "no browsers do anything with it" didn't seem like a good justification or stance.

[+] buro9|7 years ago|reply
lol... and then I come into work and find out from my colleague that actually Cloudflare now disables the Via header by default.
[+] djhworld|7 years ago|reply
Very interesting link, thanks. I'm not too familiar with this area, but from my understanding of the article, Cloudflare are suggesting that all players in the game need to be compliant otherwise nobody wins

So is this Fastly article suggesting a different point of view?

[+] kelnos|7 years ago|reply
The article mentions that Via is useful while the request is bouncing around among proxies, but isn't useful in responses, which is what the article is about.
[+] pvg|7 years ago|reply
They're talking about responses in which Via is technically 'required' but pretty useless. The blog post you linked seems to be about the use of the header in requests.
[+] voidlogic|7 years ago|reply
Wouldn't it be better to use the "Forwarded" header for this?
[+] lxe|7 years ago|reply
How is Via different from “X-Forwarded-For”?
[+] phyzome|7 years ago|reply
Saying that a header is useless because it has been deprecated and displaced by a newer header is... misleading at best.

If all you ever code for is the latest version of Firefox and Chrome, you might not understand this, but there's a whole world out there with an astonishing diversity of browsers. (Also, your site is bad and you should feel bad.) Removing X-Frame-Options without first checking if 99.99% of your users' browsers support Content-Security-Policy is just asking for increased risk.

[+] ShaneWilton|7 years ago|reply
Most of the suggestions in this post are great, but as always, especially when security is involved, you need to assess your business needs yourself.

The suggestion to use Content-Security-Policy over X-Frame-Options is great -- if you don't expect many of your users to be using IE-based browsers. If you're primarily serving large enterprises or government customers though, it's likely that most of your users will still be coming from a browser that doesn't support Content-Security-Policy.

[+] Hamuko|7 years ago|reply
P3P is unnecessary until you have clients complaining that Internet Explorer users cannot use the site and it's hurting their business. I speak of experience.

Curiously enough, P3P enforcement depends on the operating system and not on the browser. Internet Explorer 11 may or may not care about P3P depending if you're on Windows 7 or Windows 10.

[+] pfarrell|7 years ago|reply
Came here to say the exact same thing. P3P may be "officially" obsolete, but if your business wants older browsers to be able to handle your code, you're going to have to deal with it.

If you have the misfortune of encountering it, you can get really hard to detect bugs with ajax calls or script files not getting loaded in IE when you don't have P3P set up correctly. (for instance: https://www.techrepublic.com/blog/software-engineer/craft-a-...)

[+] justinsaccount|7 years ago|reply
cache-control doesn't completely replace Expires for some use cases.

If you have a scheduled task that generates data every hour, you can set Expires accordingly so all clients will refresh the data as soon as the hour rolls over.

You can do this using max-age but then you have to dynamically calculate this header per request which means you can't do things like upload your data to s3 and set the cache-control header on it.

With expires, I can upload a file to s3 and set

  Expires: ... 17:00
and then not have to touch it again for an hour.

you can work around this client side with per hour filenames or the other usual cache busting tricks, but that's annoying.

[+] laumars|7 years ago|reply
I get your point but it's such a niche use case that I can't see it coming up in real world situations. I mean, "never say never", but it's a solution that creates as many problems as it solves.

I used to build online games that fed off real world events. Eg football managers based on real football matches, games based on horse racing, F1, tour de France, and many others. We needed to change feeds when the match started and ended, but sometimes events are delayed or run into extra time. So we needed a way to change that quickly. We also needed to present different screens at the start and end of the event to the live scoring during the event. This all meant it was easier handling times based cut offs in JavaScript with the live scoring JSON files (which were being fed from S3) using cache control header because it was easier to set an X seconds into he future time out for that than rewriting the S3 tags every few seconds with a new expires header.

On paper our use case should be precisely what you described but even we found expires to be unnecessary.

[+] AbacusAvenger|7 years ago|reply
It seems like kind of an unlikely scenario that you'd want to expire content at a specific time. I mean, if someone chooses to do that, they better know what the impact could be.

With the Expires header, all clients that retrieved that content would expire at the exact same time, which could cause some disproportionately high load in the few seconds after that (the "thundering herd" problem). The Cache-Control solution will stagger the expirations (relative to when the client last retrieved it) so the server doesn't get trampled.

[+] daxterspeed|7 years ago|reply
I really wish the browser vendors would come together to establish a plan to clean up User-Agent. It's one of the worst offenders in header legacy[1] and fingerprinting. Exposing what browser I am using and it's major version is fine but I don't think every website I visit deserves to know what OS I am using, nor the details of my CPU.

[1] https://www.nczonline.net/blog/2010/01/12/history-of-the-use... (2010, though little has changed since then).

[+] gcp|7 years ago|reply
Browser vendors can't clean up User-Agent because the websites sniff it and break if it's "wrong" (for any random value of wrong).

I'm sure there's a Bugzilla bug about the "X11; Linux x86_64" in the headers, and I'd be terrified to open it.

[+] dewiz|7 years ago|reply
Client HTTP header I don't want:

  * referer
  * user-agent
Happy to be wrong, but these shouldn't be mandatory to browse the web, which they kind of are.
[+] jaytaylor|7 years ago|reply
Note: I've also written about this on my site with more notes and context:

https://jaytaylor.com/writeups/2018/why-referrer-header-empt...

--

Short version:

These days the referrer header rarely makes it through for 2 main classes of reasons [0].

1. Requests transiting across HTTP <-> HTTPS boundaries do not include the referrer header.

2. The referrer header is frequently disabled by sites (especially search engines and high-traffic sites) through the use a special HTML header meta control tag [1]:

    <meta name="referrer" content="no-referrer" />
Worry not, though. When client-side Javascript is enabled, ga.js still sends enough information that Google can reconstruct most of everyone's browsing sessions on their backend. Now Google (and only Google) really has all your / our data (generally speaking). :-\

[1] https://stackoverflow.com/questions/6880659/in-what-cases-wi...

[0] https://stackoverflow.com/questions/6817595/remove-http-refe...

[+] gboudrias|7 years ago|reply
I used to spoof my user-agent and don't remember much of a difference... As a dev, everyone tells me I should just throw literally every possible version of newer attributes into the CSS anyhow, so on most websites you're bound to get at least some of the right ones.

Perhaps your complaint is of a higher order though? Recently I've been spending most of my time wrestling with CSS so my perspective is a bit skewed...

[+] realusername|7 years ago|reply
I would be happy to do the same but there's just some browser bugs I have to fix by reading the User-Agent...
[+] dewiz|7 years ago|reply
for instance, just found today that GitHub code reviews require the Referer header to allow PR comments. Without the Referer header, GH returns `422 Unprocessable Entity`
[+] _ZeD_|7 years ago|reply
>>> Vanity (server, x-powered-by, via)

gosh, no.

server is no vanity, server is needed to know WHO THE HELL responded you (we are in a very messy cdn selectors + cdns + application layers depending on non obvious rules on (sub)domain and cookies).

[+] randomstring|7 years ago|reply
While working on a HTTP server in 2007 I discovered removing the "Server" header significantly delayed the render time in Firefox.

So beware of unexpected side-effects!

[+] AstralStorm|7 years ago|reply
That is supposed to be handled by Host header. Server etc. provides at most redundant debugging info.
[+] dijit|7 years ago|reply
Speaking of HTTP headers. One I wish more people would use is Accept-Language instead of region/geoip based localization. Practically every site I've come across ignores this header in favour of geoip with the weird and notable exception of Microsoft exchange webmail and Grafana.
[+] ggg9990|7 years ago|reply
I get that this is data that Fastly has to send but doesn’t get to bill directly to customers, but don’t expect ME to care about this until the average news article stops sending me 10 MB.
[+] manigandham|7 years ago|reply
Fastly is a CDN that charges by requests + bandwidth, so it absolutely makes money from extra headers on responses no matter how small.
[+] Steeeve|7 years ago|reply
I wouldn't trust this entry at all. The author did not do proper research to understand the why's behind the headers that he didn't understand or didn't know well enough.
[+] LinuxBender|7 years ago|reply
They list "date" as being required by protocol. This is not true. The term used in the RFC is "should". It is a nice to have, for additional validation by proxies.

In haproxy, you can discard it with:

    http-response del-header Date
[+] rkeene2|7 years ago|reply
The term the RFC (RFC 2616, Section 14.18) uses is "MUST" with 3 exceptions (HTTP 100/101 responses, which are message-less; HTTP 500-class errors which are indications that the server is malfunctioning and during this malfunction it's inconvenient to generate a date; and finally HTTP servers without clocks), which are all referencing exceptional cases -- in general HTTP/1.1 responses MUST include a Date header from the Origin server, and proxies MUST add the Date header if the Origin server failed to do so (due to 1 or more of the 3 exceptions).
[+] Rjevski|7 years ago|reply
Just curious, what would a proxy do with such a header?
[+] torstenvl|7 years ago|reply
Oh God. No. Expires and Pragma are absolutely essential if you're writing a web app to be used by folks stuck behind a walled garden proxy implemented in the dumbest way possible.
[+] sqldba|7 years ago|reply
Step 1: Complain, "Nobody follows the standard."

Step 2: Advise, "This is part of the standard but ignore it because it's pointless."

[+] prashnts|7 years ago|reply
Interesting that their blog itself has the headers they deem unnecessary...

    Server: Artisanal bits
    Via: 1.1 varnish,1.1 varnish
    X-Served-By: cache-sjc3150-SJC, cache-cdg8748-CDG
[+] yeukhon|7 years ago|reply
First, we should fix user agent. Time to dump that historical baggage.
[+] brobinson|7 years ago|reply
>P3P is a curious animal.

This was a requirement to have IE6 accept third party cookies from your site.

[+] Theodores|7 years ago|reply
It would be helpful to have a guide to this for people running a 'low audience website' where there is no CDN or Varnish, just some Apache or Nginx server on a slow-ish but cheap VPS.

For a local business or community, e.g. an arts group with a Wordpress style site, there are many common problems, they might not need a full CDN, just serving media files from a cookieless subdomain gets their site up to acceptable speed cutting the header overhead considerably.

Purging the useless headers might also include getting rid of pointless 'meta keywords' and what not.

The tips given here could be really suited to this type of simple work to get a site vaguely performant. How to do it with common little guy server setups could really help.

[+] nebulous1|7 years ago|reply
The details are interesting but "adds overhead at a critical time in the loading of your page" ... this seems pretty unlikely to have any noticeable processing overhead. Doing things better is generally good, but this all seems very low impact.
[+] __jal|7 years ago|reply
Depends on where you measure it. A client on a decent connection will never notice. If you're serving billions of hits, 20 bytes in a header is something you will definitely notice on your bandwidth bill.
[+] lopmotr|7 years ago|reply
I got stuck with a website once that was using one of the compression headers - maybe content-encoding to indicate that it's .gz files were gzipped even if the client didn't indicate it supported it. Some browsers would ignore it and just download the file, but others would unzip it. So you got a different file depending on what browser you used! I think wget and chrome behaved differently from each other. I wrote to the site operator who corrected it.