Show HN: Initial release of H2O, and why HTTPD performance will matter in 2015

[+] moe|11 years ago|reply

Very nice work, competition is always good.

However, it seems worth mentioning that webservers haven't been a bottleneck for a long time now. Your bottleneck is always disk I/O, the network, or the slow application server that you're proxying to.

For reference: Wikipedia[1] serves roughly 8k pageviews/sec on average for a total of ~20 billion pageviews/month.

Assuming each pageview consists of ~10 webserver hits we're looking at ~80k requests/sec.

This is within the realm of a single instance of either nginx or h2o on a beefy machine [on a very beefy network].

So, unless you plan to serve Wikipedia or Facebook from a single server, you're probably fine picking your webserver software on the basis of features rather than benchmarks.

[1] http://reportcard.wmflabs.org/graphs/pageviews

[+] sparkzilla|11 years ago|reply

You may be a bit off with your calculation. Wikipedia uses 15,000 cpus on 750 hosts. http://ganglia.wikimedia.org/latest/

[+] rkrzr|11 years ago|reply

Congrats on shipping! This project looks very interesting already and will hopefully pick up more contributors.

Is there already support for configuration files? Because for me the performance isn't the most important issue, in fact the main reason I'm using nginx over Apache is that I don't want to deal with .htaccess any more.

I think if you would consider adding support for the nginx config file format to H2O, thus making it a drop-in replacement for it (if all the used features are actually supported), you could give the project a huge boost.

[+] kazuho|11 years ago|reply

Thank you for the comments.

The configuration file format is YAML, and the directives can be seen by running `h2o --help` (the output of version 0.9.0 is: https://gist.github.com/kazuho/f15b79211ea76f1bf6e5).

Unfortunately they are not compatible with that of Nginx. I do not think it is possible to take such approach considering the differences in the internals of both servers.

[+] currysausage|11 years ago|reply

> in fact the main reason I'm using nginx over Apache is that I don't want to deal with .htaccess any more.

Wait, you don't use Apache because you don't like .htaccess files?

[+] xorcist|11 years ago|reply

nginx' configuration format leaves a lot to be desired, as evidenced by the (former, hopefully) widespread use of exploitable php calls.

There are also if directives in there, but they don't really work they way you think. You really need a deep understanding of its parsing rules in order to do anything remotely complicated with it. It's certainly possible to do better.

(Please don't mention Apache here and its steaming pile of faux-xml. Existence of worse does not make better.)

[+] Synless|11 years ago|reply

If you changed that to using the Hawaitha config format it would be perfect ;)

[+] scottlamb|11 years ago|reply

I'm skeptical of the performance numbers. First, like others here I don't believe nginx's performance will be a bottleneck for HTTP/2. Beyond that, I suspect there are cases in which this code is much worse than nginx.

Here's one. Look at the example request loop on <https://github.com/h2o/picohttpparser/>. It reads from a socket, appending to an initially-empty buffer. Then it tries to parse the buffer contents as an HTTP request. If the request is incomplete, the loop repeats. (h2o's lib/http1.c:handle_incoming_request appears to do the same thing.)

In particular, phr_parse_request doesn't retain any state between attempts. Each time, it goes through the whole buffer. In the degenerate case in which a client sends a large (n-byte) request one byte at a byte, it uses O(n^2) CPU for parsing. That extreme should be rare when clients are not malicious, but the benchmark is probably testing the other extreme where all requests are in a single read. Typical conditions are probably somewhere between.

[+] zzzcpan|11 years ago|reply

You are incorrect, modern clients are fast and requests typically reside in buffers by the time event driven webservers decide to read them. Nginx had parsing with state retention because of the simple idea to handle large amounts of slow clients, which was a problem back when nginx was born. As it turned out later it didn't help with malicious clients at all, because costs to retain clients' connections and get and process each new portion of data were still very high. Instead, accept filters were used and to this day are advised in such situations by nginx people.

[+] stephth|11 years ago|reply

Interesting article. And congratulations for the release!

Sorry this is a bit off-topic (and doesn't apply to H2O as it's been in the works for a while looking at the commits), but I wonder, today, with a language like Rust (1.0 is at the door [1]), as performant as its safe C equivalent but modern and safe by design (and with an escape hatch to C/C++ if needed), what would be the advantages of starting a long term project of this type in C today?

[1] http://blog.rust-lang.org/2014/12/12/1.0-Timeline.html

Edit: why the downvotes?

[+] coldtea|11 years ago|reply

Rust is not even 1.0.

And even if it was, it would take 3-5 years until it gets any decent adoption (if that happens, which remains to be seen). It doesn't even have Go level adoption yet, and Go's adoption is not something to write home about either.

C, people know very well, has tons of tooling, plays well in all platforms and has all the libraries in the world available for it.

[+] detaro|11 years ago|reply

> Edit: why the downvotes?

Because under seemingly every C project discussed here someone asks this question or claims that it is "stupid to do something like this in C" and always gets the same answers. Some users might have felt like you were trolling.

[+] nathan7|11 years ago|reply

I'm writing an HTTP reverse proxy in Rust, and my main gripe so far is that I have to roll my own async I/O. Binding node's HTTP parsers over is going well, but also takes a bunch of effort. Safe-by-design and close to the metal are proving very enjoyable to work with for this, however.

[+] hueving|11 years ago|reply

>Edit: why the downvotes?

You could just have easily have asked why it wasn't written in Lisp. It's just not relevant.

[+] Manishearth|11 years ago|reply

Note: Rust 1.0 does not mean feature complete, it means backwards compatible. A lot of the builtin libraries and features (eg compiler plugins) will not be available for use in 1.0 Rust (only in the nightlies)

1.0 Rust gives an option for people wanting to use it in production, and as far as comparing it with C goes it has a lot more functionality, but there is still a long way to go before the "stable" Rust has all the awesomeness that Rust nightlies have right now.

[+] justincormack|11 years ago|reply

Not relevant to the article "doesn't apply to H2O as it's been in the works for a while looking at the commits"...

[+] PythonicAlpha|11 years ago|reply

I think, it is because language fights are not appreciated so much here.

[+] halayli|11 years ago|reply

This doesn't look like a complete HTTP server, comparing it with nginx is not fair.

. It's missing content-encoding handling on the receiving side

. No http continue support

. No regex routing support

. No header rewrites

to name a few.

[+] scottlamb|11 years ago|reply

> This doesn't look like a complete HTTP server, comparing it with nginx is not fair.

It's certainly not full-featured, but I don't think any of the omissions you mentioned should invalidate a performance comparison. I'd expect them to have little cost when not used, and I assume he's not using them for nginx in these tests.

[+] robbles|11 years ago|reply

> Instead, switching back to sending small asset files for every required element consisting the webpage being request becomes an ideal approach

This doesn't solve the other side of the problem that spritesheets are meant to solve, namely that an individual image will not be loaded yet when the first UI element using it is displayed (e.g. in a CSS rollover, or new section of a SPA appears). I can't see a way that new protocols are going to solve this, unless I'm missing something in how HTTP2 is going to be handled by the browser?

I assume that once you're forced to preload everything you might need for the page, it's no longer more efficient to break up into multiple tiny requests.

[+] zub|11 years ago|reply

> I can't see a way that new protocols are going to solve this, unless I'm missing something in how HTTP2 is going to be handled by the browser?

HTTP/2 server push. Your server can proactively deliver things like rollover state graphics knowing that the client will need them.

[+] youngtaff|11 years ago|reply

Spritesheets and other forms of asset concatenation are aiming to reduce the impact of round-trip latency and overhead of HTTP i.e repeated headers.

I've always seen the "image is already loaded" as a "nice" side effect but spritesheets can have issues in mobile contexts as the whole image must be decoded jsut to access the sprite, it's unclear how effectively browsers cache the individual sprites in memory, compared to individual images too.

[+] ncallaway|11 years ago|reply

I would consider continuing to spritesheet the button itself. The main difference is that each roll-over effect can be separated from all the other spritesheets in the world.

Still, an exciting time when we can combine files based on what is the most logical grouping, rather than what is the most efficient. I look forward to the day when HTTP2 rules the world.

[+] Shish2k|11 years ago|reply

Looking at the tangentially linked qrintf project that H2O uses ( https://github.com/h2o/qrintf ), replacing generic sprintf calls with specialised versions for a 10x speed boost - that seems like a brilliant idea, I wonder why it took so long for somebody to think of it?

[+] chubot|11 years ago|reply

People have thought of it -- there's probably just not that many applications where sprintf is a bottleneck. Especially enough of a bottleneck to justify a code gen tool.

The OCaml community, and probably others, have noted that printf is an embedded DSL and treat it as something to be compiled rather than interpreted.

http://okmij.org/ftp/typed-formatting/

http://caml.inria.fr/pub/docs/manual-ocaml/libref/Printf.htm... (I have a memory of this being type safe and doing stuff at compile time but I don't see it now)

Rust borrows heavily from OCaml, and uses compile time macros for printf and regex, i.e. format! and regex! (the trailing ! means it's a macro that can be further compiled by the compiler).

http://doc.rust-lang.org/std/fmt/

http://doc.rust-lang.org/regex/regex/index.html

[+] Someone|11 years ago|reply

A quick google gave me https://gcc.gnu.org/ml/gcc/2007-10/msg00073.html, so at least somebody thought of it over 7 years ago.

Also, http://www.ciselant.de/projects/gcc_printf/gcc_printf.html#p... shows gcc did something in this area 9 years ago.

http://www.cygwin.com/ml/libc-hacker/2001-08/msg00003.html indicates gcc had printf optimizations in 2001.

I expect there are older examples.

qrintf seems to be 'just' a more aggressive version of this.

I guess being more aggressive makes sense in applications that do a lot of simple string formatting.

[+] nly|11 years ago|reply

One of the problems with compiling print/scanf is that a lot of the overhead comes from locale handling, which is a runtime variable. Parsing is fairly negligible for short format strings.

[+] unknown|11 years ago|reply

[deleted]

[+] zzzcpan|11 years ago|reply

Socket API is a bottleneck now, right? So, next step: roll your own http-friendly tcp stack on top of netmap/dpdk and get 10x performance increase over nginx.

[+] wmf|11 years ago|reply

IX is something like that: https://www.usenix.org/conference/osdi14/technical-sessions/...

[+] jarnix|11 years ago|reply

Obviously it's great software. Does Kazuho work alone on this ? If it's meant to replace nginx, it needs a lot of other options/functions/extensions/modules/...

Is it getting commercial support/funds ?

[+] kazuho|11 years ago|reply

Contributors are highly welcome, obviously!

For myself, developing H2O is part of my job at DeNA (one of the largest smartphone game providers in Japan).

[+] PythonicAlpha|11 years ago|reply

Looks very promising!

I am not so much into web-servers (yet), but I found this in the feature list:

  reverse proxy
    HTTP/1 only (no HTTPS)

Are there any plans to add also HTTPS-support for reverse proxy? Since I have to include a secondary (Tornado) web-server unto my stack for dynamic pages.

It also puzzled me, that https is not supported, but in the benchmarks I found a part: "HTTPS/2 (reverse-proxy)". As I said, I am not so much in Web-servers and https/2, but that was a little confusing.

[+] kazuho|11 years ago|reply

Sorry for the confusion.

HTTP and HTTPS (both version 1 and 2) are supported for downstream connections (i.e. connection bet. H2O and web browsers). Only plain-text HTTP/1 is supported for upstream connections (connection bet. H2O and web application servers).

[+] jvehent|11 years ago|reply

That's a cool project. Performance is a fascinating topic.

However, in the real world, the number of requests per second a http daemon can perform is the last thing to worry about. If the web is slow, it's not because Apache used to be bloated with thread. It's because of bad architecture: centralization of services, latency in page builds time, size of static components, data store bottlenecks, etc...

Nevertheless, a very cool project. One I'll follow closely.

[+] ams6110|11 years ago|reply

Another one to keep an eye on might be the new httpd in OpenBSD. http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man8/...

I'm not seeing that there is yet a portable version however.

[+] justincormack|11 years ago|reply

It is not intended to be anything other than a minimal secure http 1 server, not performant.

[+] bkeroack|11 years ago|reply

If you're relying on HTTP for all your microservices, you're doing it wrong.

[+] nnx|11 years ago|reply

Why? What's wrong with this approach?

[+] dschiptsov|11 years ago|reply

So, it has better string, pool allocators, zero-copy buffers and syscall support than nginx/core/*.ch? That would be a mirracle.

[+] Aldo_MX|11 years ago|reply

I love this kind of projects which people receive with much skepticism, but after some years will bring interesting improvements to us.

[+] thresh|11 years ago|reply

Hello there, can you share the performance test details? The configurations of both servers, client software, hwserver setups.

Thanks!

[+] haosdent|11 years ago|reply

I couldn't understand why it could faster than Nginx? Maybe the way of benchmark nginx in this case is wrong?

[+] huhtenberg|11 years ago|reply

That's a very good code. Succinct and readable. You clearly now your C well :)

[+] xfalcox|11 years ago|reply

Any plans to get script support, like nginx access_by_lua?

[+] okpatil|11 years ago|reply

It seems that everything mentioned in the library could be done with golang easily. I am interested to see how H2O benchmarks with pure golang binaries.

83 comments