top | item 815681

Tornado: FriendFeed's non-blocking Python web server is now open source

291 points| finiteloop | 16 years ago |bret.appspot.com | reply

73 comments

order
[+] tdavis|16 years ago|reply
This is more of a combination of a web server and a web framework, which is what I find fascinating about it. Twisted has had a web server forever, but despite a significant amount of experience in Twisted Land, I'd never write a full app using it! Add to that the (apparently) standalone low-level modules and we've got some seriously awesome tech that FriendFeed/Facebook have supplied for free.

Thanks a lot, fellas. Asynchronous networking programming in Python is somewhat of a bear so it's really nice to see a tool get released that improves it!

[+] n8agrin|16 years ago|reply
Awesome to see more Python webservers coming to light, and fast ones at that. One observation though; anyone else notice that the bar chart is a bit misleading? It compares running Tornado in 4 processes behind nginx to running Django behind Apache and Cherrypy as a single Python process. As a fellow coworker put it, "With 4 extra processes running of course it will be 4 times as fast." I'm not opposed to this kind of comparison when the intent is to show how configurations can increase req/sec, but if the intent is to compare one Pythonic webserver to another, this comparison seems a bit unfair. That said, assuming the "Tornado (1 single-threaded frontend)" measure is simply Tornado running as a single Python process, it still is plenty faster than Cherrypy.
[+] mdasen|16 years ago|reply
Apache/mod_wsgi creates multiple processes (or can). The issue is that Facebook didn't tell us what configuration settings they used. Did they limit the number of processes? Did they not have it start up enough at the start?

It's clear that Facebook wasn't using a single Apache/mod_wsgi process because that wouldn't get close to 2,000 requests per second. I'm sure they gave it the number of processes that were appropriate for the amount of RAM on the box - the issue is that every Apache process (of which I'm sure they dozens if not hundreds) uses up memory.

It's hard to set that up properly and that's what makes benchmarking so hard. Still, it's not hard to believe that with the weight of Apache in the process, there's going to be at least 5MB of overhead for every request. If you were saving that 5MB of RAM per client times the 2,223 requests per second, that's 11GB worth of freed RAM per second. Let's say Django uses 10MB of RAM for a request: that's an extra 50% increase in capacity right there just by eliminating the 5MB Apache process overhead. And Apache tends to be on the heavier side (5MB is somewhat low as an estimate) and if you can do it asynchronously, you're not tying up RAM as you wait for clients, so you do well.

Now, put Apache/mod_wsgi behind nginx (as a proxy) and you'll also see an increase in requests per second because you won't be tying up all that RAM as a client is downloading and only use the RAM while actually doing work in Python freeing that memory to be used elsewhere. However, I'm guessing that they ran this benchmark locally and so the network tying up Apache instances wasn't a factor.

Clearly, benchmarks are benchmarks, but it isn't as if Facebook and FriendFeed don't have lots of smart engineers and they wouldn't be using their own server if Apache was significantly better. They have a site of massive scale and know how to benchmark things since you need objective data to make decisions based on rather than the religious arguments some get into.

I'd guess that, if one could get Django running under Tornado, it would run similarly and that it's more of Apache being the bottleneck/RAM hog.

[+] finiteloop|16 years ago|reply
CherryPy is multi-threaded, so I am not sure what you are saying is correct. There are two types of servers: multi-threaded, multi-process. Tornado is multi-process. If your server is multi-threaded, it uses all of the cores without additional processes.

CherryPy did max out the CPU on all of the cores in the load test, so I think it was a fair test.

[+] n8agrin|16 years ago|reply
The templating system they developed looks pretty sweet http://github.com/facebook/tornado/blob/master/tornado/templ....

Just played around with it a bit and it's basically what I want:

1) Simple and clear syntax (e.g. they use 'end' not endfor, endblock, etc)

2) Assign template variables to anything (including functions)

3) Don't over-restrict the author (e.g. they allow list comprehensions in if tags)

4) Block and extends statements.

Error handling seems to be a little clearer than other template languages though still not great:

http://gist.github.com/184934

Clearly it's not going to be for everyone but after slogging around with Mako for the last year (<%namespace:def /> tags anyone?) this is like a breath of fresh air.

[+] finiteloop|16 years ago|reply
Thanks! Yah, the reason we rolled our own is because all the rest were so restrictive to authors. We didn't want a template system telling us what we should and shouldn't do in templates, so ours was a very thin layer on top of what basically translates directly to Python. It is actually one of my favorite parts of the system.
[+] RoboTeddy|16 years ago|reply
It seems like blocking calls to data sources (the database, memcache, etc) would screw with a lot of the benefits of running the framework asynchronously. If your Tornado processes freeze while making synchronous backend requests, you're gonna need a lot of them, which probably kills a lot of the value.

Now all we need are async data clients that tie into the Tornado event loop, and a clean way to yield control back and forth (perhaps with coroutines - http://www.python.org/dev/peps/pep-0342/). Unfortunately, I don't think there are any async Python mysql clients (PHP has one now - http://www.scribd.com/doc/7588165/mysqlnd-Asynchronous-Queri...)

A web app written like that would run like it's on fire. The event loop would just roll through everything asynchronously. You could even automatically batch together data calls across http requests to make things easier on the data tier.

[+] DenisM|16 years ago|reply
The browser-webserver delay resulting from long polling is much larger the frontend-backend delay. I think there is at least an order of magnitude difference there, hence an order of magnitude reduction is the amount of idle state kept on the server.
[+] extension|16 years ago|reply
If you go with an evented architecture like this, you have to use it for pretty much all I/O or the whole thing falls apart. Lack of drivers and protocols has been a large barrier to adoption.
[+] cheriot|16 years ago|reply
Definitely rough around the edges, but looks like a great foundation.

A few notes: - template.py: "Error-reporting is currently... uh, interesting." - url mapping looks primitive - no form helpers - database.py looks decent, but is mysql only - Very nice to see the security considerations in signed cookies, auth, csrf protection.

Overall, it looks like they've done the trickier parts of building a web framework and left it to the user/community to add the parts web developers use most frequently (form & url helpers, code organization, and orm).

I love Facebook's enlightened approach to open source. If only one of their open source projects takes off, it will benefit them tremendously.

[+] xal|16 years ago|reply
I implemented the same idea in Ruby event machine: http://gist.github.com/184760

Obviously missing features (auth, nicks, scrolling) but that can all be added in a few mins.

[+] tiredandempty|16 years ago|reply
how about turning this to a full framework? or is there one already?
[+] usaar333|16 years ago|reply
What exactly does 'non-blocking' mean in the context of a webserver?
[+] cheriot|16 years ago|reply
Holding the TCP connection open does not tie up all the resources of the request handling thread. That way a large number of inactive connections can stay open.
[+] acangiano|16 years ago|reply
Non-blocking sockets. Calls to socket.read() can return no data. It's non-blocking because the program is not stuck waiting for data to be returned.
[+] ComputerGuru|16 years ago|reply
I've been looking for a good framework to begin serious web development on for someone coming from a decade-long career in desktop development (see the discussion at http://news.ycombinator.com/item?id=808191), and decided this release co-incides nicely with my newly-started quest and gave it a shot.

I realize Tornado isn't any different from the other Python frameworks with regards to coding style, etc. but the the fact that this framework comes complete with a web server means I don't have to worry about that part of the equation making developing Python-based webapps almost identical to developing a C++ library with one of the many HTML-based UI frontends :)

Having a great time playing around with it... almost done with a basic forum system built on Tornado + Storm (https://storm.canonical.com/).. I think I'm getting the hang of this whole web-development thing! :D

[+] apoirier|16 years ago|reply
For a real different coding style, more like a desktop development, take a look at Nagare (http://www.nagare.org), a continuation and components based web framework. And it also comes with an integrated HTTP server (and a fastCGI one) ;)
[+] calaniz|16 years ago|reply
This looks great. I'm likely to look at it a little more in the future. I've been using Second Life's asyncrounous coroutine library called eventlet. I've implemented most of my code with a good lightweight framework that fits; restish. On top of that, I use spawning to manage my server processes. I myself have seen these kinds of numbers with my own app tests.
[+] omouse|16 years ago|reply
Does this implement all of the HTTP 1.0 and 1.1 protocol specs? I'm kinda confused by the code :S
[+] drawkbox|16 years ago|reply
This is great for python, a proven web framework that was stressed on friend feed. Thanks ff team! I have been torn on my next python server project between web2py, cherrypy and django and I think I just decided.
[+] mdipierro|16 years ago|reply
For the record. They tested web.py not web2py. It is not the same thing and they are completely unrelated.
[+] dlsspy|16 years ago|reply
I could use this in apps today if it used twisted instead of reinventing it.
[+] finiteloop|16 years ago|reply
How many deployed web apps really use Twisted? There are like 3 web packages in Twisted, most of which are really buggy, and as far as I could tell, barely used (even they acknowledge this, see http://twistedmatrix.com/trac/wiki/WebDevelopmentWithTwisted).

When we were developing this, we found that Twisted introduced as many problems as it solved in terms of incomplete features and bugs. The other protocols seem to have more attention than HTTP from what I could tell.

[+] statictype|16 years ago|reply
What would be the advantage of using this over, say, lighttpd with one of the many existing python frameworks?

Does this web server offer any specific gains? Or is it just a question of personal taste?

[+] auston|16 years ago|reply
At a first glance, this looks SERIOUSLY awesome!
[+] prakash|16 years ago|reply
Bret, quick question. Does the framework support an esi type tags to abstract out personalized info? thanks!
[+] finiteloop|16 years ago|reply
I am not sure what esi type tags are. Mind sending me a link?

Suffices to say, no, we do not support that :)

[+] charlesju|16 years ago|reply
Is this similar to EventMachine in Ruby?
[+] fizx|16 years ago|reply
EM is a general-purpose library for nonblocking IO. Tornado is an HTTP server. In as much as they're both about nonblocking IO, they are similar.
[+] superjared|16 years ago|reply
I'd say it's a little more similar to Thin, which is a web server based on EM.