First and foremost, everyone needs caching. It's what makes computers fast. That RAM you have? Cache. The memory in your CPU? Cache. The memory in your hard drive? Cache.
Your filesystem has a cache. Your browser has a cache. Your DNS resolver has a cache. Your web server's reverse proxy [should] have a cache. Your database [should] have a cache. Every place that you can conceivably shove in another cache, it can't hurt. Say it with me now: Cache Rules Everything Around Me.
First you should learn how web servers work, why we use them, and how to configure them. The reason your Apache instance was running slow is probably because you never tuned it. Granted, five years ago its asynchronous capabilities were probably haggard and rustic. It's gotten a lot more robust in recent years, but that's beside the article's point. Nginx is an apple, CloudFront is an orange.
Next you should learn what CDNs are for. Mainly it's to handle lots of traffic reliably and provide a global presence for your resources, as well as shielding your infrastructure from potential problems. Lower network latency is just a happy side effect.
Obviously the title was a little ridiculous, but I up-voted it, because it's a novel idea. If you're planning to have almost all of your static assets hosted from the CDN (which is pretty reasonable for almost everyone) then why bother with a super high-throughput low-latency web server if the only purpose is to occasionally refill the CDN? If you end up thrashing the CDN and constantly going back to refill it, you're going to have bigger problems.
From what I can tell, the rest of your comment is just super aggressive and doesn't really go anywhere. I will tell you that I have extensive experience with every piece you've mentioned here, and none of that really has any effect on the author's thesis (again, no need to optimize serving static content from your host if a CDN is going to do the legwork).
In general though, when someone works at Mozilla, I tend to give them the benefit of the doubt regarding their knowledge of elementary computing principles.
But if the CDN is serving your static assets, your origin webserver only has to generate them once[1] to populate the CDN. It almost doesn't matter how long this takes. And this works well enough that you don't need to bother setting up an Nginx or Apache instance at all. And furthermore, you don't have to copy your static files anywhere -- just use your framework's built-in webserver for everything!
This greatly simplifies your production deployments and in my books that's a huge win.
[1] well... that fudges it a bit, since each POP needs to make its own fetch, and the assets can theoretically drop out of the CDN's cache; so the truth is actually "a handful of times" instead of "once".
More generally: once you adopt any of the various schemes for having a inbound proxy/front-end cache (Fastly, CloudFlare, CloudFront, or an in-house varnish/squid/etc), are all the optimizing habits of moving static assets to a dedicated server now superfluous?
I think those optimizing habits are now obsolete: best practice is to have a front-end cache.
A corollary is that we usually needn't worry about a dynamic framework serving large static assets: the front-end cache ensures it happens rarely.
Unfortunately it's still the doctrine of some projects that a production project will always offload static-serving. So for example, the Django docs are filled with much traditional discouragement around using the staticfiles serving app in production, including vague 'this is insecure' intimations. In fact, once you're using a front-end cache, there's little speed/efficiency reason to avoid that practice. And, if it is specifically supected to be insecure, any such insecurity should to be fixed rather than overlooked simply because "it's not used in production".
Tools like Apache and nginx are not ONLY faster at serving files with less load on the system than a script. They are also more thoroughly audited and battle-tested. And their declarative configs won't go wrong just because the person writing them missed an unbelievably subtle corner case introduced by using a Turing-complete language.
It's important because there are so many opportunities for error in serving arbitrary files out of a filesystem with some rough and ready script.
For example, if you are serving files out of the same filesystem that holds your configs and secret keys then you should be a bit nervous. You have to get the permissions right and make sure you don't have anything improper under a directory which you are publishing as a whole.
If your users are uploading files to the same place you should feel really nervous.
There are too many easy ways for people to be negligent and screw this up. In the context of designing an opinionated framework, you accept a lot of social liability and you are really dropping the ball if you are setting up tired and ignorant users to screw up this badly, without even a warning in the docs to think about what you are doing.
With n script languages and m static file serving implementations per language, there are now (n*m) obscure packages to audit. Not counting their combinations...
Your idea to just "fix the insecurity" and remove any warning from the docs means to do things which you merely believe to fix the insecurity, and then overlook the underlying risks of the approach.
I am also not sure you are right when you suggest that there cannot be any performance (or reliability) impact of pushing static serving into some script library. Just as these are not audited they are also not nearly as likely to be benchmarked and tuned.
If there is a reason to serve static files out of script, that reason will be because of some positive reason (like convenience or the need for some particular flexibility) rather than some vague sense that using Apache is "obsolete".
Regarding Django and staticfiles; Rightly so, because they're not ready for production and being taken over my the CDN. You need to sprinkle some django_compressor on it first but still that doesn't get the cache headers perfectly right. Or the gzipping.
I think in Django's case, it's not so much "this is insecure", but more that runserver is simple/stupid and hasn't been built or tested for serving multiple concurrent users.
Nor should it be. There are plenty of WSGI capable web servers that can be installed into a Django project as apps.
If you want to run just Django behind a frontside cache and server static media from the same connection, simply install an appropriate app. Gunicorn and cherrypy both 'drop in replacements' for the built in runserver and both well up to the task.
nginx still buys you SSI (which allows you to, for example, cache the same page for all users and have nginx swap out the username with a value stored in memcache), complex rewrite rules, fancy memcache stuff with the memc module (ex: view counters), proxying to more than ten upstream servers, fastcgi, and lots of other fancy stuff.
Cloudfront is a replacement for varnish, not nginx.
Isn't that better to do in a programming environment you're more familiar with? LIke python/rails/ASP
Then you have much better tools for building unit tests and stuff too.
Does anyone have experience with using nginx as a caching proxy? I've used Varnish and swear by it, it's just an amazing piece of software. How well can nginx replace Varnish?
I think not. Requirements change, and locking myself in to a front-end cache is not appealing. I may also have things which I can't or won't let others cache for me, so I want my local stack to be optimized anyway. You won't see me serving everything out of WEBrick anytime soon just because I have a cloud cache.
It's nice to be able to defer decisions, especially optimizations, but making performance someone else's problem entirely seems like it could promote sloppy thinking and poor work. It's the difference between augmenting a solid platform when the need arises versus front-loading dependencies because it's okay to be lazy.
On several of my modern projects, there's not a single piece of static data that can't be cached forever in a CDN. That's because server-side code is not getting really good at managing the initial build of static assets and the delivery of their URL.
There's a good post from late 2011, in the context of 12-factor deployment on Heroku, where the author muses about just using a pure Python server behind a CDN to serve static content:
...and yeah, I think I should bloody use this server as a backend to serve my in production.
Sure it's obsolete, who needs databases and live, chancing data. All we need is a static pages. Besides who needs to build his own infrastructure, it's 2012 right ? Let's buy it.
This misses his point that originally he had app <-> nginx <-> user, then he added cloudfront so he had app <-> nginx <-> cloudfront <-> user. At that point is nginx really serving much purpose?
I think for the average use case the answer is no, nginx doesn't buy you much. However nginx is a lot more flexible than cloudfront, so if you have more complicated caching rules and such nginx is a perfect fit.
If you want to serve static files cheaply and are moving less than 10TB/mo you will find that CloudFront is a magnitude more expensive than bunch of VPSes with lots of monthly bandwidth.
Viability of this depends heavily on the use but if you're moving funny pictures of cats then you won't be generating lots of income and want to optimize the bandwidth costs.
Before implementing that, be aware that CloudFront doesn't support custom SSL certificates. If you have any user-session in your app, you don't want them to login on https://efac1bef32rf3c.cloudfront.net/login
CloudFront is pretty good, just make sure you are able to config your asset source in one line. Otherwise you have to use a tool to invalidate the cloudfront cache frequently during dev and it's not instant.
Note that you can now configure CloudFront to take query strings into account when caching files. Tweaking the query string is basically instant, unlike waiting for the invalidation tool...
[+] [-] peterwwillis|13 years ago|reply
First and foremost, everyone needs caching. It's what makes computers fast. That RAM you have? Cache. The memory in your CPU? Cache. The memory in your hard drive? Cache.
Your filesystem has a cache. Your browser has a cache. Your DNS resolver has a cache. Your web server's reverse proxy [should] have a cache. Your database [should] have a cache. Every place that you can conceivably shove in another cache, it can't hurt. Say it with me now: Cache Rules Everything Around Me.
First you should learn how web servers work, why we use them, and how to configure them. The reason your Apache instance was running slow is probably because you never tuned it. Granted, five years ago its asynchronous capabilities were probably haggard and rustic. It's gotten a lot more robust in recent years, but that's beside the article's point. Nginx is an apple, CloudFront is an orange.
Next you should learn what CDNs are for. Mainly it's to handle lots of traffic reliably and provide a global presence for your resources, as well as shielding your infrastructure from potential problems. Lower network latency is just a happy side effect.
[+] [-] alexgartrell|13 years ago|reply
Obviously the title was a little ridiculous, but I up-voted it, because it's a novel idea. If you're planning to have almost all of your static assets hosted from the CDN (which is pretty reasonable for almost everyone) then why bother with a super high-throughput low-latency web server if the only purpose is to occasionally refill the CDN? If you end up thrashing the CDN and constantly going back to refill it, you're going to have bigger problems.
From what I can tell, the rest of your comment is just super aggressive and doesn't really go anywhere. I will tell you that I have extensive experience with every piece you've mentioned here, and none of that really has any effect on the author's thesis (again, no need to optimize serving static content from your host if a CDN is going to do the legwork).
In general though, when someone works at Mozilla, I tend to give them the benefit of the doubt regarding their knowledge of elementary computing principles.
[+] [-] kiwidrew|13 years ago|reply
This greatly simplifies your production deployments and in my books that's a huge win.
[1] well... that fudges it a bit, since each POP needs to make its own fetch, and the assets can theoretically drop out of the CDN's cache; so the truth is actually "a handful of times" instead of "once".
[+] [-] gojomo|13 years ago|reply
I think those optimizing habits are now obsolete: best practice is to have a front-end cache.
A corollary is that we usually needn't worry about a dynamic framework serving large static assets: the front-end cache ensures it happens rarely.
Unfortunately it's still the doctrine of some projects that a production project will always offload static-serving. So for example, the Django docs are filled with much traditional discouragement around using the staticfiles serving app in production, including vague 'this is insecure' intimations. In fact, once you're using a front-end cache, there's little speed/efficiency reason to avoid that practice. And, if it is specifically supected to be insecure, any such insecurity should to be fixed rather than overlooked simply because "it's not used in production".
[+] [-] slurgfest|13 years ago|reply
It's important because there are so many opportunities for error in serving arbitrary files out of a filesystem with some rough and ready script.
For example, if you are serving files out of the same filesystem that holds your configs and secret keys then you should be a bit nervous. You have to get the permissions right and make sure you don't have anything improper under a directory which you are publishing as a whole. If your users are uploading files to the same place you should feel really nervous.
There are too many easy ways for people to be negligent and screw this up. In the context of designing an opinionated framework, you accept a lot of social liability and you are really dropping the ball if you are setting up tired and ignorant users to screw up this badly, without even a warning in the docs to think about what you are doing.
With n script languages and m static file serving implementations per language, there are now (n*m) obscure packages to audit. Not counting their combinations...
Your idea to just "fix the insecurity" and remove any warning from the docs means to do things which you merely believe to fix the insecurity, and then overlook the underlying risks of the approach.
I am also not sure you are right when you suggest that there cannot be any performance (or reliability) impact of pushing static serving into some script library. Just as these are not audited they are also not nearly as likely to be benchmarked and tuned.
If there is a reason to serve static files out of script, that reason will be because of some positive reason (like convenience or the need for some particular flexibility) rather than some vague sense that using Apache is "obsolete".
[+] [-] peterbe|13 years ago|reply
[+] [-] ra|13 years ago|reply
Nor should it be. There are plenty of WSGI capable web servers that can be installed into a Django project as apps.
If you want to run just Django behind a frontside cache and server static media from the same connection, simply install an appropriate app. Gunicorn and cherrypy both 'drop in replacements' for the built in runserver and both well up to the task.
[+] [-] meritt|13 years ago|reply
nginx can do a lot more than serve static files.
[+] [-] peterbe|13 years ago|reply
[+] [-] cbsmith|13 years ago|reply
[Insert Oscar winning Face of Shock here]
[+] [-] rabidsnail|13 years ago|reply
Cloudfront is a replacement for varnish, not nginx.
[+] [-] peterbe|13 years ago|reply
[+] [-] qw|13 years ago|reply
[+] [-] georgebarnett|13 years ago|reply
e.g.: Is Mountain Lion going to kill Windows 8? .. etc.
[+] [-] lubutu|13 years ago|reply
(Please, can this stop?)
[+] [-] sirn|13 years ago|reply
[1]: http://en.wikipedia.org/wiki/Betteridges_Law_of_Headlines
[+] [-] StavrosK|13 years ago|reply
[+] [-] mef|13 years ago|reply
[+] [-] bithive123|13 years ago|reply
It's nice to be able to defer decisions, especially optimizations, but making performance someone else's problem entirely seems like it could promote sloppy thinking and poor work. It's the difference between augmenting a solid platform when the need arises versus front-loading dependencies because it's okay to be lazy.
[+] [-] peterbe|13 years ago|reply
[+] [-] kiwidrew|13 years ago|reply
...and yeah, I think I should bloody use this server as a backend to serve my in production.
http://tech.blog.aknin.name/2011/12/28/i-wish-someone-wrote-...
[+] [-] est|13 years ago|reply
At least you should try `sendfile`.
[+] [-] devmach|13 years ago|reply
[+] [-] mnutt|13 years ago|reply
I think for the average use case the answer is no, nginx doesn't buy you much. However nginx is a lot more flexible than cloudfront, so if you have more complicated caching rules and such nginx is a perfect fit.
[+] [-] peterbe|13 years ago|reply
If you need to build a toaster, you don't need to build an iron smelting plant. Certain things other folks are better at taking care of.
[+] [-] jeffbarr|13 years ago|reply
[+] [-] SeppoErviala|13 years ago|reply
Viability of this depends heavily on the use but if you're moving funny pictures of cats then you won't be generating lots of income and want to optimize the bandwidth costs.
[+] [-] zimbatm|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] banana_bread|13 years ago|reply
[+] [-] peterbe|13 years ago|reply
[+] [-] eli|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] stevewilhelm|13 years ago|reply