However, it seems worth mentioning that webservers haven't been a bottleneck for a long time now. Your bottleneck is always disk I/O, the network, or the slow application server that you're proxying to.
For reference: Wikipedia[1] serves roughly 8k pageviews/sec on average for a total of ~20 billion pageviews/month.
Assuming each pageview consists of ~10 webserver hits we're looking at ~80k requests/sec.
This is within the realm of a single instance of either nginx or h2o on a beefy machine [on a very beefy network].
So, unless you plan to serve Wikipedia or Facebook from a single server, you're probably fine picking your webserver software on the basis of features rather than benchmarks.
Congrats on shipping! This project looks very interesting already and will hopefully pick up more contributors.
Is there already support for configuration files? Because for me the performance isn't the most important issue, in fact the main reason I'm using nginx over Apache is that I don't want to deal with .htaccess any more.
I think if you would consider adding support for the nginx config file format to H2O, thus making it a drop-in replacement for it (if all the used features are actually supported), you could give the project a huge boost.
Unfortunately they are not compatible with that of Nginx. I do not think it is possible to take such approach considering the differences in the internals of both servers.
nginx' configuration format leaves a lot to be desired, as evidenced by the (former, hopefully) widespread use of exploitable php calls.
There are also if directives in there, but they don't really work they way you think. You really need a deep understanding of its parsing rules in order to do anything remotely complicated with it. It's certainly possible to do better.
(Please don't mention Apache here and its steaming pile of faux-xml. Existence of worse does not make better.)
I'm skeptical of the performance numbers. First, like others here I don't believe nginx's performance will be a bottleneck for HTTP/2. Beyond that, I suspect there are cases in which this code is much worse than nginx.
Here's one. Look at the example request loop on <https://github.com/h2o/picohttpparser/>. It reads from a socket, appending to an initially-empty buffer. Then it tries to parse the buffer contents as an HTTP request. If the request is incomplete, the loop repeats. (h2o's lib/http1.c:handle_incoming_request appears to do the same thing.)
In particular, phr_parse_request doesn't retain any state between attempts. Each time, it goes through the whole buffer. In the degenerate case in which a client sends a large (n-byte) request one byte at a byte, it uses O(n^2) CPU for parsing. That extreme should be rare when clients are not malicious, but the benchmark is probably testing the other extreme where all requests are in a single read. Typical conditions are probably somewhere between.
You are incorrect, modern clients are fast and requests typically reside in buffers by the time event driven webservers decide to read them. Nginx had parsing with state retention because of the simple idea to handle large amounts of slow clients, which was a problem back when nginx was born. As it turned out later it didn't help with malicious clients at all, because costs to retain clients' connections and get and process each new portion of data were still very high. Instead, accept filters were used and to this day are advised in such situations by nginx people.
Interesting article. And congratulations for the release!
Sorry this is a bit off-topic (and doesn't apply to H2O as it's been in the works for a while looking at the commits), but I wonder, today, with a language like Rust (1.0 is at the door [1]), as performant as its safe C equivalent but modern and safe by design (and with an escape hatch to C/C++ if needed), what would be the advantages of starting a long term project of this type in C today?
And even if it was, it would take 3-5 years until it gets any decent adoption (if that happens, which remains to be seen). It doesn't even have Go level adoption yet, and Go's adoption is not something to write home about either.
C, people know very well, has tons of tooling, plays well in all platforms and has all the libraries in the world available for it.
Because under seemingly every C project discussed here someone asks this question or claims that it is "stupid to do something like this in C" and always gets the same answers. Some users might have felt like you were trolling.
I'm writing an HTTP reverse proxy in Rust, and my main gripe so far is that I have to roll my own async I/O. Binding node's HTTP parsers over is going well, but also takes a bunch of effort. Safe-by-design and close to the metal are proving very enjoyable to work with for this, however.
Note: Rust 1.0 does not mean feature complete, it means backwards compatible. A lot of the builtin libraries and features (eg compiler plugins) will not be available for use in 1.0 Rust (only in the nightlies)
1.0 Rust gives an option for people wanting to use it in production, and as far as comparing it with C goes it has a lot more functionality, but there is still a long way to go before the "stable" Rust has all the awesomeness that Rust nightlies have right now.
> This doesn't look like a complete HTTP server, comparing it with nginx is not fair.
It's certainly not full-featured, but I don't think any of the omissions you mentioned should invalidate a performance comparison. I'd expect them to have little cost when not used, and I assume he's not using them for nginx in these tests.
> Instead, switching back to sending small asset files for every required element consisting the webpage being request becomes an ideal approach
This doesn't solve the other side of the problem that spritesheets are meant to solve, namely that an individual image will not be loaded yet when the first UI element using it is displayed (e.g. in a CSS rollover, or new section of a SPA appears). I can't see a way that new protocols are going to solve this, unless I'm missing something in how HTTP2 is going to be handled by the browser?
I assume that once you're forced to preload everything you might need for the page, it's no longer more efficient to break up into multiple tiny requests.
Spritesheets and other forms of asset concatenation are aiming to reduce the impact of round-trip latency and overhead of HTTP i.e repeated headers.
I've always seen the "image is already loaded" as a "nice" side effect but spritesheets can have issues in mobile contexts as the whole image must be decoded jsut to access the sprite, it's unclear how effectively browsers cache the individual sprites in memory, compared to individual images too.
I would consider continuing to spritesheet the button itself. The main difference is that each roll-over effect can be separated from all the other spritesheets in the world.
Still, an exciting time when we can combine files based on what is the most logical grouping, rather than what is the most efficient. I look forward to the day when HTTP2 rules the world.
Looking at the tangentially linked qrintf project that H2O uses ( https://github.com/h2o/qrintf ), replacing generic sprintf calls with specialised versions for a 10x speed boost - that seems like a brilliant idea, I wonder why it took so long for somebody to think of it?
People have thought of it -- there's probably just not that many applications where sprintf is a bottleneck. Especially enough of a bottleneck to justify a code gen tool.
The OCaml community, and probably others, have noted that printf is an embedded DSL and treat it as something to be compiled rather than interpreted.
Rust borrows heavily from OCaml, and uses compile time macros for printf and regex, i.e. format! and regex! (the trailing ! means it's a macro that can be further compiled by the compiler).
One of the problems with compiling print/scanf is that a lot of the overhead comes from locale handling, which is a runtime variable. Parsing is fairly negligible for short format strings.
Socket API is a bottleneck now, right?
So, next step: roll your own http-friendly tcp stack on top of netmap/dpdk and get 10x performance increase over nginx.
Obviously it's great software. Does Kazuho work alone on this ? If it's meant to replace nginx, it needs a lot of other options/functions/extensions/modules/...
I am not so much into web-servers (yet), but I found this in the feature list:
reverse proxy
HTTP/1 only (no HTTPS)
Are there any plans to add also HTTPS-support for reverse proxy? Since I have to include a secondary (Tornado) web-server unto my stack for dynamic pages.
It also puzzled me, that https is not supported, but in the benchmarks I found a part: "HTTPS/2 (reverse-proxy)". As I said, I am not so much in Web-servers and https/2, but that was a little confusing.
HTTP and HTTPS (both version 1 and 2) are supported for downstream connections (i.e. connection bet. H2O and web browsers).
Only plain-text HTTP/1 is supported for upstream connections (connection bet. H2O and web application servers).
That's a cool project. Performance is a fascinating topic.
However, in the real world, the number of requests per second a http daemon can perform is the last thing to worry about. If the web is slow, it's not because Apache used to be bloated with thread. It's because of bad architecture: centralization of services, latency in page builds time, size of static components, data store bottlenecks, etc...
Nevertheless, a very cool project. One I'll follow closely.
It seems that everything mentioned in the library could be done with golang easily. I am interested to see how H2O benchmarks with pure golang binaries.
[+] [-] moe|11 years ago|reply
However, it seems worth mentioning that webservers haven't been a bottleneck for a long time now. Your bottleneck is always disk I/O, the network, or the slow application server that you're proxying to.
For reference: Wikipedia[1] serves roughly 8k pageviews/sec on average for a total of ~20 billion pageviews/month.
Assuming each pageview consists of ~10 webserver hits we're looking at ~80k requests/sec.
This is within the realm of a single instance of either nginx or h2o on a beefy machine [on a very beefy network].
So, unless you plan to serve Wikipedia or Facebook from a single server, you're probably fine picking your webserver software on the basis of features rather than benchmarks.
[1] http://reportcard.wmflabs.org/graphs/pageviews
[+] [-] sparkzilla|11 years ago|reply
[+] [-] rkrzr|11 years ago|reply
Is there already support for configuration files? Because for me the performance isn't the most important issue, in fact the main reason I'm using nginx over Apache is that I don't want to deal with .htaccess any more.
I think if you would consider adding support for the nginx config file format to H2O, thus making it a drop-in replacement for it (if all the used features are actually supported), you could give the project a huge boost.
[+] [-] kazuho|11 years ago|reply
The configuration file format is YAML, and the directives can be seen by running `h2o --help` (the output of version 0.9.0 is: https://gist.github.com/kazuho/f15b79211ea76f1bf6e5).
Unfortunately they are not compatible with that of Nginx. I do not think it is possible to take such approach considering the differences in the internals of both servers.
[+] [-] currysausage|11 years ago|reply
Wait, you don't use Apache because you don't like .htaccess files?
[+] [-] xorcist|11 years ago|reply
There are also if directives in there, but they don't really work they way you think. You really need a deep understanding of its parsing rules in order to do anything remotely complicated with it. It's certainly possible to do better.
(Please don't mention Apache here and its steaming pile of faux-xml. Existence of worse does not make better.)
[+] [-] Synless|11 years ago|reply
[+] [-] scottlamb|11 years ago|reply
Here's one. Look at the example request loop on <https://github.com/h2o/picohttpparser/>. It reads from a socket, appending to an initially-empty buffer. Then it tries to parse the buffer contents as an HTTP request. If the request is incomplete, the loop repeats. (h2o's lib/http1.c:handle_incoming_request appears to do the same thing.)
In particular, phr_parse_request doesn't retain any state between attempts. Each time, it goes through the whole buffer. In the degenerate case in which a client sends a large (n-byte) request one byte at a byte, it uses O(n^2) CPU for parsing. That extreme should be rare when clients are not malicious, but the benchmark is probably testing the other extreme where all requests are in a single read. Typical conditions are probably somewhere between.
[+] [-] zzzcpan|11 years ago|reply
[+] [-] stephth|11 years ago|reply
Sorry this is a bit off-topic (and doesn't apply to H2O as it's been in the works for a while looking at the commits), but I wonder, today, with a language like Rust (1.0 is at the door [1]), as performant as its safe C equivalent but modern and safe by design (and with an escape hatch to C/C++ if needed), what would be the advantages of starting a long term project of this type in C today?
[1] http://blog.rust-lang.org/2014/12/12/1.0-Timeline.html
Edit: why the downvotes?
[+] [-] coldtea|11 years ago|reply
And even if it was, it would take 3-5 years until it gets any decent adoption (if that happens, which remains to be seen). It doesn't even have Go level adoption yet, and Go's adoption is not something to write home about either.
C, people know very well, has tons of tooling, plays well in all platforms and has all the libraries in the world available for it.
[+] [-] detaro|11 years ago|reply
Because under seemingly every C project discussed here someone asks this question or claims that it is "stupid to do something like this in C" and always gets the same answers. Some users might have felt like you were trolling.
[+] [-] nathan7|11 years ago|reply
[+] [-] hueving|11 years ago|reply
You could just have easily have asked why it wasn't written in Lisp. It's just not relevant.
[+] [-] Manishearth|11 years ago|reply
1.0 Rust gives an option for people wanting to use it in production, and as far as comparing it with C goes it has a lot more functionality, but there is still a long way to go before the "stable" Rust has all the awesomeness that Rust nightlies have right now.
[+] [-] justincormack|11 years ago|reply
[+] [-] PythonicAlpha|11 years ago|reply
[+] [-] halayli|11 years ago|reply
. It's missing content-encoding handling on the receiving side
. No http continue support
. No regex routing support
. No header rewrites
to name a few.
[+] [-] scottlamb|11 years ago|reply
It's certainly not full-featured, but I don't think any of the omissions you mentioned should invalidate a performance comparison. I'd expect them to have little cost when not used, and I assume he's not using them for nginx in these tests.
[+] [-] robbles|11 years ago|reply
This doesn't solve the other side of the problem that spritesheets are meant to solve, namely that an individual image will not be loaded yet when the first UI element using it is displayed (e.g. in a CSS rollover, or new section of a SPA appears). I can't see a way that new protocols are going to solve this, unless I'm missing something in how HTTP2 is going to be handled by the browser?
I assume that once you're forced to preload everything you might need for the page, it's no longer more efficient to break up into multiple tiny requests.
[+] [-] zub|11 years ago|reply
HTTP/2 server push. Your server can proactively deliver things like rollover state graphics knowing that the client will need them.
[+] [-] youngtaff|11 years ago|reply
I've always seen the "image is already loaded" as a "nice" side effect but spritesheets can have issues in mobile contexts as the whole image must be decoded jsut to access the sprite, it's unclear how effectively browsers cache the individual sprites in memory, compared to individual images too.
[+] [-] ncallaway|11 years ago|reply
Still, an exciting time when we can combine files based on what is the most logical grouping, rather than what is the most efficient. I look forward to the day when HTTP2 rules the world.
[+] [-] Shish2k|11 years ago|reply
[+] [-] chubot|11 years ago|reply
The OCaml community, and probably others, have noted that printf is an embedded DSL and treat it as something to be compiled rather than interpreted.
http://okmij.org/ftp/typed-formatting/
http://caml.inria.fr/pub/docs/manual-ocaml/libref/Printf.htm... (I have a memory of this being type safe and doing stuff at compile time but I don't see it now)
Rust borrows heavily from OCaml, and uses compile time macros for printf and regex, i.e. format! and regex! (the trailing ! means it's a macro that can be further compiled by the compiler).
http://doc.rust-lang.org/std/fmt/
http://doc.rust-lang.org/regex/regex/index.html
[+] [-] Someone|11 years ago|reply
Also, http://www.ciselant.de/projects/gcc_printf/gcc_printf.html#p... shows gcc did something in this area 9 years ago.
http://www.cygwin.com/ml/libc-hacker/2001-08/msg00003.html indicates gcc had printf optimizations in 2001.
I expect there are older examples.
qrintf seems to be 'just' a more aggressive version of this.
I guess being more aggressive makes sense in applications that do a lot of simple string formatting.
[+] [-] nly|11 years ago|reply
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] zzzcpan|11 years ago|reply
[+] [-] wmf|11 years ago|reply
[+] [-] jarnix|11 years ago|reply
Is it getting commercial support/funds ?
[+] [-] kazuho|11 years ago|reply
For myself, developing H2O is part of my job at DeNA (one of the largest smartphone game providers in Japan).
[+] [-] PythonicAlpha|11 years ago|reply
I am not so much into web-servers (yet), but I found this in the feature list:
Are there any plans to add also HTTPS-support for reverse proxy? Since I have to include a secondary (Tornado) web-server unto my stack for dynamic pages.It also puzzled me, that https is not supported, but in the benchmarks I found a part: "HTTPS/2 (reverse-proxy)". As I said, I am not so much in Web-servers and https/2, but that was a little confusing.
[+] [-] kazuho|11 years ago|reply
HTTP and HTTPS (both version 1 and 2) are supported for downstream connections (i.e. connection bet. H2O and web browsers). Only plain-text HTTP/1 is supported for upstream connections (connection bet. H2O and web application servers).
[+] [-] jvehent|11 years ago|reply
However, in the real world, the number of requests per second a http daemon can perform is the last thing to worry about. If the web is slow, it's not because Apache used to be bloated with thread. It's because of bad architecture: centralization of services, latency in page builds time, size of static components, data store bottlenecks, etc...
Nevertheless, a very cool project. One I'll follow closely.
[+] [-] ams6110|11 years ago|reply
I'm not seeing that there is yet a portable version however.
[+] [-] justincormack|11 years ago|reply
[+] [-] bkeroack|11 years ago|reply
[+] [-] nnx|11 years ago|reply
[+] [-] dschiptsov|11 years ago|reply
[+] [-] Aldo_MX|11 years ago|reply
[+] [-] thresh|11 years ago|reply
Thanks!
[+] [-] haosdent|11 years ago|reply
[+] [-] huhtenberg|11 years ago|reply
[+] [-] xfalcox|11 years ago|reply
[+] [-] okpatil|11 years ago|reply