> the Varnish HTTP cache has been used very successfully to speed up WordPress. But Varnish doesn’t help a lot with logged-in traffic
> This is caching that Varnish and other “normal” HTTP caches (including CloudFlare) could not have done
Varnish supports the ESI (Edge-Side Includes) standard, which allows it to cache fragments of a page, and for the cache server to build them again. It also allows you to completely bypass the cache for certain fragments. This is also supported by a number of CDNs (Fastly, Akamai). I've used the ESI technique several times and have been able to achieve a >98% cache hit rate on Fastly for a site with dynamic per-user content. Even the cache misses are only responsible for rendering a small component of the page
Good to know. Using edge side includes may be easier than trying to change the app to a semi single page app. But that only solves half of the problem. The other half is varying the response based on the value of a specific cookie.
I've updated the blog post with information regarding edge side include.
Varnish also supports plugins for extreme flexibility. For example, I wrote a plugin for our Varnish install which performs HMAC validation of a specific signed cookie and then sets a header which is used downstream in the caching rules.
Varnish is mature, powerful, and fast as hell. It would take a lot of work to reach a point where I'd swap it out for something else.
I've left some comments on the Disqus thread on the blog, but I'll reiterate my concern about the security of the cookie being set.
The cookie being set is unsigned according to the documentation[0] on rails, so a user could modify it and send it back to get a different cached response. Say that they saw that the user level was being set in there (like in the blog post) and they change the value to the 'staff' value to get the staff cached response. Probably not a good idea!
With that said, I don't think that this technique is adequate right now when you have user access level concerns and you're relying on that piece of unsigned information to not be tampered with.
I think it's been said enough here but Varnish can certainly do almost everything that they are saying it can't do, including some other storage optimizations like storing the gzipped response and serving a non-gzipped when requested (as of Varnish 3.0).
You can use Vary for tons of caching optimizations via varnish, such as caching mobile web-pages vs non-mobile web pages or just a particular header. It's all about just flexing a little bit of VCL (which I'll admit sometimes can throw people off).
They had me until this part:
> This is an HTTP cache built directly in Passenger so that it can achieve much higher performance than external HTTP caches like Varnish.
And since they have no benchmarks to really back up these claims, I'm skeptical they did much research against Varnish to tune or set it up. I'd love to see the numbers on varnish vs their turbocache. Without numbers, I have to take a lot of it with a grain of salt.
Either way, seems like it could be an extra handy thing to have in your toolbox, as long as it fits your stack.
You're reading too much hostility in it. The article does not claim to be better than Varnish. Performance is achieved by not implementing many features. For example, Varnish is a full-blown programming language and has an infinite size. The Passenger turbocache has almost no configuration options, does not support any sort of custom programming and only supports 8 entries at most. The fact that the max size is so small allows it to be implemented in very compact data structures, but its usefulness is also severely limited. It's merely designed with a different set of tradeoffs than Varnish.
It isn't that difficult to make a web server that's faster all the production servers out there. For example H2O is faster than Nginx... because it has infinitely less features. Ditto for Passenger's turbocache.
The rest of the article describes some ideas, some of which cannot be implemented using pure Varnish and require cooperation from the app. In my opinion you're missing out on a lot of interesting ideas if you stop reading only because you think Varnish is being slammed.
The part about inability to speed up authenticated page loads fails to take into consideration things like ESI.
If the majority of your page is still the same for logged-in users, but they see some pieces of content personalised for them, breaking them out into individually request-able components means you can let software like Varnish or a CDN rely on it's cache of the main page content, and make a very small (and ideally simple to process on the backend) request for the per-user content.
It took me some hunting to find it (no docs for v5 yet apparently) but if your stack already includes a caching layer (e.g. Varnish or a CDN) you may want to disable this extra cache using the config described here: http://blog.phusion.nl/2014/11/25/introducing-phusion-passen...
Everything in this article has been well-known for at least half a decade, yet is being presented as major technical breakthroughs. Too much marketing, IMO.
The first example looks nothing like parsing a specific cookie. It merely sanitizes the cookie headers a little, but it doesn't extract out a specific cookie to use as cache key.
And both examples you link to remove cookies. That's not what we're after. We're after the extraction of a specific cookie without removing anything.
It's also not marketing for a commercial product. The research is for an open source project, and the code is public and open source. The entire point of the blog post is to call for research participants who could not only test our ideas, but who could also point out anything we might have missed.
I had a contract at a company about 13 years ago where I was working on a web-based CMS that had been built entirely in-house. Because each rendered page's content was built up in a hierarchical manner, I added a caching layer that allowed arbitrary portions of the rendered page "tree" to be cached all the way up to the entire page and HTTP response if possible. Each cached portion could be located quickly based on its dependencies (template ID, content ID, CSS etc). If any edits were made to the site, only the relevant portions needed to be flushed and re-rendered. User-specific portions could be cached in the user's session rather than site-wide. (Thinking about it now, each portion could have been rendered in parallel too, though this was back in the days when multicore machines weren't very common and it wasn't something that occurred to me.)
I built this without giving much thought to whether anyone else had attempted something similar. Presumably they had though I do remember being disappointed when researching the various open source caching libraries, they didn't offer much help at the time. Overall I was pretty happy with the way it all worked and the performance boost it provided was like night and day.
Sadly the company is no longer operating, presumably the CMS codebase is long lost.
I have seen this referred to as the "Russian doll" caching strategy. You can find several examples using Rail's partials or Django's cache template tag.
Most major frameworks include something like this but it's usually harder to use, 'partials' caching as it is called is a pretty effective way to speed up the server side of things when you need that.
Varnish has a similar facility (which allows you to abstract it out of the framework code entirely).
A dependencies ('make') like automatic approach to this would be quite nice to have, maybe your approach could be retro-fitted onto one of the existing CMSs?
Hmmm. I might be missing something here, but I routinely clean out cookies in "public"/unauthed URLs in Varnish, as well as hashing the cache based on _part_ of a specific cookie (the bit that defines, say, the site's theme, or a generic user role).
They do mention that they didn't investigate how to do this in Varnish, but I recall having picked up the basic technique from one of the author's posts.
This was my first thought too. You can vary by any subset of cookies in Varnish if you simply normalize the Cookie header by stripping out irrelevant cookies. Another trick I've used is to parse the cookie to look for the user ID and set a User-Id header on the backend response and then set Vary: User-Id.
> Normally, non-cacheable page fragments would make every page uncacheable.
AFAICT, ASP.NET supported partial output caching since 2003. And, it can store multiple versions of a cached item via a user-defined parameter (a property on the cached control class). It seems it might be possible to simply create a property that returns the user ID and let the cache vary on that.
Of course, this uses the Web Forms controls-based system, which isn't very popular. But calling this type of caching new seems a bit of a stretch.
Edit: And here's a link talking about VaryByCustom, complete with example code that uses the version of the browser to generate unique per-browser-version caches.
Even ignoring the fact that edge side includes are the way to go when confronted with this problem, it sounds like they're basically saying they'd like to include the userid in the Vary header, but cannot because the Cookie header includes a bunch of other stuff.
Instead of parsing the Cookie header and doing a bunch of additional work in the cache layer, why not just add a separate custom header (X-User-Id, say) and Vary on that?
You're talking about response headers. But this deals with sending responses, you only get requests. Request headers can't contain a custom header that you control (unless your JS is making the requests of course). Hence you have to transform the Cookie header into whatever you need.
I almost always configure Varnish not to cache on Vary Cookie and pair that with some simple changes on the back-end to keep most my pages cacheable.
One, don't blindly set Vary Cookie on every single page. Two, when a page only needs minor variation like a username; store the username in a cookie and use JavaScript to display it on the page.
Yeah, the biggest advantage ESI offers is not requiring JS. If that's important to you, configure it and do extra rendering on the server-side. If not... :)
When you mentioned discourse serving 19k req/sec, does it hit the ruby stack at all? If no, serving 19k req/sec of cached HTML doesn't seem that impressive. What am I missing?
I used (and modified) one based on Dreamhost's https://github.com/dreamhost/varnish-vcl-collection/blob/mas... but there's a bunch of others on Github if you search for them. Note that this isn't a shortcut so you don't have to learn VCL... and that the way I did it was to put Varnish on port 80 and the web server on another port. I've seen other configs where a load balancer asks Varnish when it knows varnish should have the answer and otherwise sends traffic directly to the web server. And sometimes your config will vary based on asset type or domain name -- e.g. static.example.org assets might be cacheable forever while your site's pages might only require a 2 minute cache. You can invalidate cached-forever pages with commands for Varnish or by varying the URL using a "cache buster" hash or string in the asset filename. Oh and don't forget to set the storage type Varnish uses to malloc instead of file.
[+] [-] samarudge|11 years ago|reply
Varnish supports the ESI (Edge-Side Includes) standard, which allows it to cache fragments of a page, and for the cache server to build them again. It also allows you to completely bypass the cache for certain fragments. This is also supported by a number of CDNs (Fastly, Akamai). I've used the ESI technique several times and have been able to achieve a >98% cache hit rate on Fastly for a site with dynamic per-user content. Even the cache misses are only responsible for rendering a small component of the page
[+] [-] FooBarWidget|11 years ago|reply
I've updated the blog post with information regarding edge side include.
[+] [-] cheald|11 years ago|reply
Varnish is mature, powerful, and fast as hell. It would take a lot of work to reach a point where I'd swap it out for something else.
[+] [-] stephenr|11 years ago|reply
I'm not really sure what situations this built-in cache would be more effective than the likes of a well-tuned Varnish.
[+] [-] markcampbell|11 years ago|reply
The cookie being set is unsigned according to the documentation[0] on rails, so a user could modify it and send it back to get a different cached response. Say that they saw that the user level was being set in there (like in the blog post) and they change the value to the 'staff' value to get the staff cached response. Probably not a good idea!
With that said, I don't think that this technique is adequate right now when you have user access level concerns and you're relying on that piece of unsigned information to not be tampered with.
[0] http://api.rubyonrails.org/classes/ActionDispatch/Cookies.ht...
[+] [-] codingjester|11 years ago|reply
You can use Vary for tons of caching optimizations via varnish, such as caching mobile web-pages vs non-mobile web pages or just a particular header. It's all about just flexing a little bit of VCL (which I'll admit sometimes can throw people off).
They had me until this part:
> This is an HTTP cache built directly in Passenger so that it can achieve much higher performance than external HTTP caches like Varnish.
And since they have no benchmarks to really back up these claims, I'm skeptical they did much research against Varnish to tune or set it up. I'd love to see the numbers on varnish vs their turbocache. Without numbers, I have to take a lot of it with a grain of salt.
Either way, seems like it could be an extra handy thing to have in your toolbox, as long as it fits your stack.
[+] [-] FooBarWidget|11 years ago|reply
It isn't that difficult to make a web server that's faster all the production servers out there. For example H2O is faster than Nginx... because it has infinitely less features. Ditto for Passenger's turbocache.
The rest of the article describes some ideas, some of which cannot be implemented using pure Varnish and require cooperation from the app. In my opinion you're missing out on a lot of interesting ideas if you stop reading only because you think Varnish is being slammed.
[+] [-] stephenr|11 years ago|reply
If the majority of your page is still the same for logged-in users, but they see some pieces of content personalised for them, breaking them out into individually request-able components means you can let software like Varnish or a CDN rely on it's cache of the main page content, and make a very small (and ideally simple to process on the backend) request for the per-user content.
It took me some hunting to find it (no docs for v5 yet apparently) but if your stack already includes a caching layer (e.g. Varnish or a CDN) you may want to disable this extra cache using the config described here: http://blog.phusion.nl/2014/11/25/introducing-phusion-passen...
[+] [-] WimLeers|11 years ago|reply
> This is caching that Varnish and other “normal” HTTP caches (including CloudFlare) could not have done.
This is false. I'm not at all very familiar with Varnish, but I know this is easily possible, and has been used for many, many years.
E.g. for Drupal + Varnish, i.e. to only keep Drupal's session cookie, I found these examples, in less than a minute of googling:
- https://www.varnish-cache.org/trac/wiki/VarnishAndDrupal
- https://www.lullabot.com/blog/article/configuring-varnish-hi... (grep for "inclusion")
Everything in this article has been well-known for at least half a decade, yet is being presented as major technical breakthroughs. Too much marketing, IMO.
[+] [-] FooBarWidget|11 years ago|reply
And both examples you link to remove cookies. That's not what we're after. We're after the extraction of a specific cookie without removing anything.
It's also not marketing for a commercial product. The research is for an open source project, and the code is public and open source. The entire point of the blog post is to call for research participants who could not only test our ideas, but who could also point out anything we might have missed.
[+] [-] jgrahamc|11 years ago|reply
[+] [-] chris_overseas|11 years ago|reply
I built this without giving much thought to whether anyone else had attempted something similar. Presumably they had though I do remember being disappointed when researching the various open source caching libraries, they didn't offer much help at the time. Overall I was pretty happy with the way it all worked and the performance boost it provided was like night and day.
Sadly the company is no longer operating, presumably the CMS codebase is long lost.
[+] [-] megaman821|11 years ago|reply
[+] [-] jacquesm|11 years ago|reply
Varnish has a similar facility (which allows you to abstract it out of the framework code entirely).
A dependencies ('make') like automatic approach to this would be quite nice to have, maybe your approach could be retro-fitted onto one of the existing CMSs?
[+] [-] cordite|11 years ago|reply
[+] [-] rcarmo|11 years ago|reply
They do mention that they didn't investigate how to do this in Varnish, but I recall having picked up the basic technique from one of the author's posts.
[+] [-] michaelmior|11 years ago|reply
[+] [-] MichaelGG|11 years ago|reply
AFAICT, ASP.NET supported partial output caching since 2003. And, it can store multiple versions of a cached item via a user-defined parameter (a property on the cached control class). It seems it might be possible to simply create a property that returns the user ID and let the cache vary on that.
Of course, this uses the Web Forms controls-based system, which isn't very popular. But calling this type of caching new seems a bit of a stretch.
http://msdn.microsoft.com/en-us/library/k4he1ds5%28v=vs.71%2...
Edit: And here's a link talking about VaryByCustom, complete with example code that uses the version of the browser to generate unique per-browser-version caches.
http://msdn.microsoft.com/en-us/library/5ecf4420%28v=vs.71%2...
[+] [-] Argorak|11 years ago|reply
I've successfully build SPAs using that strategy. It's basically Rails russian doll-caching on a dedicated process.
[+] [-] hannibalhorn|11 years ago|reply
Instead of parsing the Cookie header and doing a bunch of additional work in the cache layer, why not just add a separate custom header (X-User-Id, say) and Vary on that?
[+] [-] WimLeers|11 years ago|reply
[+] [-] megaman821|11 years ago|reply
One, don't blindly set Vary Cookie on every single page. Two, when a page only needs minor variation like a username; store the username in a cookie and use JavaScript to display it on the page.
[+] [-] lstamour|11 years ago|reply
[+] [-] jcase|11 years ago|reply
[+] [-] sandGorgon|11 years ago|reply
It seems strangely hard to configure something like this.
[+] [-] lstamour|11 years ago|reply
[+] [-] est|11 years ago|reply
[+] [-] ddorian43|11 years ago|reply
http://cramer.io/2013/06/27/serving-python-web-applications/
[+] [-] marvy|11 years ago|reply