301 redirects: a dangerous one way street (2012)

[+] _Codemonkeyism|10 years ago|reply

The problem with 301 without cache headers is that some browsers cache this forever due to some interpretation what 'permanent' means.

You often can't use 302 because all your external links no longer work SEO magic for you with a 302. Google only transfers link juice with 301 [1].

If you make a mistake and misconfigure your server, you're toast.

If a disgruntled employee 301 redirects your domain, you're toast.

If a service provider misconfigures your domain, you're toast.

If a hacker (from a competitor) 301 redirects your domain, you're toast.

If you buy a domain that had a 301 on it, it's worthless.

If you buy a domain that had 301s on it that point to phishing sites, you're in trouble.

I always add cache headers to 301 redirects I use to at least prevent me from shooting myself with an arrow in my knee.

UPDATE: [1] Google seems to have changed this recently. It also no longer considers http/https different pages as it did in the past with the same content https://www.searchenginejournal.com/google-confirms-no-loss-...

[+] zrm|10 years ago|reply

Not only that, anybody with a WiFi Pineapple or any other MITM could 301 redirect anything you visit without HTTPS to their spam site.

It seems like browsers interpreting permanent as forever is some kind of a bug. Even if that's literally what it says, that's not what anybody wants. What great evil is being prevented by not having it expire and be refreshed after 45 days?

[+] maaaats|10 years ago|reply

> If a hacker (from a competitor) 301 redirects your domain, you're toast.

Reminds me of the guy that "got hold" of Google's domain for a few minutes. What if something similar happened and someone were able to make this redirect? Millions would be affected.

[+] ghayes|10 years ago|reply

It would be great if there were a service that allowed you to search non-cached controlled 301s a crawler had encountered.

[+] revorad|10 years ago|reply

Thanks for this. Can you please share an example of the cache headers you add to 301 redirects?

[+] rpgmaker|10 years ago|reply

I think that when you clear the cache on recent versions of Chrome it also removes 301 redirects. I'm not sure though.

[+] unknown|10 years ago|reply

[deleted]

[+] Piskvorrr|10 years ago|reply

That's what permanent means. "Adjective permanent Without end, eternal. Lasting for an indefinitely long time. "

Also, "This response is cacheable unless indicated otherwise," says RFC 2616.

Working as designed, IMNSHO. Perhaps not working as intended, but alas, that's a case of ¬RTFM.

[+] Typhon|10 years ago|reply

This seems yet another example of web professionals not understanding HTTP.

Much like webpages that say "404 not found" with a "200 OK" header.

[+] kzrdude|10 years ago|reply

It's unreasonable to cache it for longer than a very long time (3 months or so). Nothing in "cache" says permanent storage.

[+] _yy|10 years ago|reply

Maybe it's bad design, then?

[+] franze|10 years ago|reply

2012 - somebody should write 2012 into the title of this post (that by the way hasn't any concrete data)

I did some testing in 2009, think around 2012 and 2014. Additional to loffilegrepping after some big site URL rewrites.

It's a non issue. No caching headers, the redirect gets cached only for the current browser session. Close it, reopen it, gone, done.

Lets discuss this one based on data. (Which I cant provide right now as Im on a beach on Sri Lanka right now with a FirefoxOS device amd I dont know how to see Http requests on this one, but) Please prove me wrong! based on test, data, not blogposts.

[+] rebelde|10 years ago|reply

It still seems to be a problem.

I just tested it in Firefox on a Mac. I restarted Firefox. I even rebooted. Developer Tools > Network tab says "cached". I can't confirm that it is cached forever, but it is not only "for the current browser session".

[+] zephod|10 years ago|reply

What data do you need? This is empirically observable. Half the people in this thread are giving you examples.

[+] asutherland|10 years ago|reply

Here's the relevant Gecko (Firefox) code which tries to use the max-age and expires headers first and then will set it to forever if the response code was 300, 410, 301, or 308. Note that I'm going by a somewhat shallow code reading after a recent investigation. There's a lot of stuff going on in gecko/necko that could potentially apply some failsafe time limit, maybe in the cache implementation, so I wouldn't take this as 100% for sure. Breaking on this function with gdb and tracing the flow is probably a better idea if you really want to know.

https://dxr.mozilla.org/mozilla-central/source/netwerk/proto...

The 301, 308 stuff comes from IsPermanentRedirect which is here: https://dxr.mozilla.org/mozilla-central/source/netwerk/proto...

[+] _Codemonkeyism|10 years ago|reply

Browsers have not changed in this regard.

[+] AgentME|10 years ago|reply

Are you sure you tested with a 301 redirect (with no caching headers) and not 302 or one of the others? I've been bit by 301s personally.

[+] pluma|10 years ago|reply

"There are only two hard things in Computer Science..."

I guess some people think the purpose of 301 is more like that of 410: update references so you don't try to go there again. The difference is that with 301 you additionally instruct the client to not even attempt to go there again in the future.

But the article does raise an interesting point: if I own somedomain.example and set it up with a 301 redirect to myotherdomain.example and enough people visit it that most people will have cached the redirect, doesn't that basically mean I now own it for perpetuity (or until enough people have cleared their cache) even if I don't renew the domain and new requests to the domain are no longer served (by the same IP)?

Or do browsers have some kind of protections against this, at least based on DNS? It's a bit too convoluted for a proper DOS attack (because you need to own the domain long enough and make it popular enough to poison everyone's caches) but a naive implementation seems like it would effectively render domains unusable if someone set up a 301 on them at some point in the past.

[+] mnw21cam|10 years ago|reply

For an important busy web site, even serving up a redirect for a short amount of time could be enough to cause some serious problems. It's a way of turning "I hacked and defaced this website but they fixed it 24 hours later" into "I hacked and defaced this website and they fixed it 24 hours later, but loads of people still see the defaced version".

[+] zephod|10 years ago|reply

I took over a domain which had previously 301-redirected HTTP:// to HTTPS://. It caused us no end of trouble getting the alpha site online -- obviously we set up SSL but we didn't realise it was the _first thing we'd have to do_.

It also caused half a day of confusion to understand why some of our web browsers were still failing to connect and others could see the alpha site (because they'd never visited the previous 301 site at that address).

[+] nly|10 years ago|reply

This isn't just a problem with things like HTTP. The industry as a whole lacks a standard uniform way of dealing with domain transfers or expiration. CAs for example will happily issue certificates that expire after your domain.

[+] notpeter|10 years ago|reply

This was probably HSTS not a 301. HSTS headers include a validity, but 1yr is common. Good luck convince anyone to undo this, clearing cache does help and you need to dive into your browser internals for a fix: chrome://net-internals/#hsts

https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security

[+] sparewalking|10 years ago|reply

In such cases I always compare 'curl -I site' with the affected browser console.

[+] kijin|10 years ago|reply

I was recently burned by this. The client had left his server misconfigured for a few hours, and a lot of his static content ended up in a redirect loop. I was called in to fix the mess.

Although modern browsers are clever enough to detect a redirect loop and throw an error, they're not clever enough to detect when the redirect loop is caused by a cached 301 response. So they cache the redirect loop as well. Throw in another layer of caching (CloudFlare), and now you've got a bunch of URLs that will be stuck in a redirect loop for a very long time.

The only solution was to append some garbage to every URL, like "?cache=no". Fortunately, the problem only occurred with static content, so nginx happily discarded the querystring and returned fresh content.

[+] teddyh|10 years ago|reply

> You can improve a bit on this by sending along a bunch of cache control headers to at least limit the damage.

It would have been useful to include those headers in the blog post.

[+] Piskvorrr|10 years ago|reply

Something like this says "keep this cached for 100 days":

Last-Modified: Fri, 19 Feb 2016 12:54:49 +0100

Expires: Sun, 29 May 2016 12:54:49 +0200

Cache-Control: max-age=8640000, must-revalidate

[+] spoiler|10 years ago|reply

This can be fixed with a 302 from the destination back to to the start/source of the 301, assuming that the old domain isn't redirecting any more (which would cause a redirect loop)

[+] tobltobs|10 years ago|reply

Uh, I just realized that there are a lot of bullets in my foot.

[+] borkabrak|10 years ago|reply

I'll take two of this on a t-shirt, please.

[+] ragmaanir|10 years ago|reply

I have a draft blogpost (somehow octopress did not respect my darft: true setting) that describes the same problem: http://ragmaanir.mypresident.de/blog/2015/08/03/ruby-web-dev...

Also, 301 might poison proxy-caches, so even if you clear the cache in your browser it might still not work.

[+] honksillet|10 years ago|reply

So if you were to momentarily hack a big site like twitter.com and serve out a 301 to, say, pornhub, you would permanently brick twitter?

[+] dyladan|10 years ago|reply

Likely a site with the influence of twitter would be able to get this taken care of. (directions at another domain on how to clear your cache or even a direct update from browser vendors possibly)

[+] amichal|10 years ago|reply

This is a browser design issue not a protocol/server issue. At the time HTTP 1.1 was written browser caches would hold at most a few weeks of browsing history before things started falling out so 'permanent' functioned like our intuition

Whats more surprising is "13.2.2 Heuristic Expiration"[1]

If you specify (or your framework does) a Last-Modified time WITHOUT Cache-Control the browser is free to make up its OWN cache expiration rules (the item is implicitly cacheable)

[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

[+] bcoates|10 years ago|reply

Heuristic Expiration is a case of documenting existing weirdness. Older browsers would have cache expiration policy be a setting the user could control, so you can't really leave off explicit headers with any expectation a default will be respected.

[+] miseg|10 years ago|reply

It's easy to live your life to keep Google happy. 301 redirects have been an important part of online website life.

But then I do accept this perspective (if it's within your call to take this risk). Just don't 301-redirect it. Let the search engines figure it out for themselves.

If a user has a bookmark to an old resource, then it's a liability for you to try to keep your web of 301s working.

KISS!

[+] rileymat2|10 years ago|reply

I would say links are a bigger problem than bookmarks. For instance stack overflow answer links to apple or msdn references, are really annoying if they die.

[+] perlgeek|10 years ago|reply

When working with non-trivial redirects, test them with something like "wget -S $url" instead of your browser. The redirect caching makes it very painful to to repeatedly test the same redirect.

Also, start with a 302, and only change the status to 301 once you're confident they are correct.

[+] dk8996|10 years ago|reply

Is there any company that provides 301 or 302 as a service? -- something cheap. I know it's not hard to set-up a small box on AWS and install Node.js (or whatever). But I would pay few dollars a month for some service to run that for me.

[+] kdeenanauth|10 years ago|reply

Wouldn't a URL shortener service be good enough? (E.g. https://goo.gl/ )

[+] newscracker|10 years ago|reply

If the owner of the domain had done it, would it still really be an "eternal issue" since:

a. most popular websites nowadays are so bloated that browser caches would throw out many things (including your tiny site that someone checks a few times a week or even longer) a lot sooner compared to how things were about 10 years ago (I presume the default disk cache sizes in browsers have not increased by multiples in this period).

b. more people are browsing through mobile devices that are dumped in a few years and replaced with a new one, new browser, empty cache, etc.

[+] TazeTSchnitzel|10 years ago|reply

HSTS is similarly one-way, but it's not indefinite, I think.

[+] pluma|10 years ago|reply

It's not indefinite because you need to specify a duration. However nothing stops you from setting an extremely long duration and in fact most tutorials seem to advise doing so for safety reasons.

[+] ghostek|10 years ago|reply

What part of "permanent" did the author miss? Seriously, specs should be read literally, if interpretation is required then it's not a perfect spec.

[+] mwcampbell|10 years ago|reply

Now I don't feel so bad for being lazy and just using a 302 everywhere. I never even bothered to learn how to configure nginx to send a 301.

[+] colanderman|10 years ago|reply

I thought best practice these days, at least for REST, was a 308 (https://tools.ietf.org/html/rfc7238), since 301 has the bizarre behavior of being converted to a GET by some UAs (which conflate it with 303), enshrined for hysterical raisins. Is this not the case?

86 comments