Ask HN: What load balancing software do you use ?

[+] anotherjesse|18 years ago|reply

I'm a huge fan of both Nginx and HAProxy - used together.

[internet] <-> [Nginx] <-> [HAProxy] <-> [app servers]

Nginx is a great webserver, but isn't a good load balancer. You can install a patch that improves the balancer - http://brainspl.at/articles/2007/11/09/a-fair-proxy-balancer... - but it still isn't as nice as HAProxy

With HAProxy the status of the system is visible. For the largest site I use HAProxy on I keep my status page public - http://userscripts.org/haproxy - but it isn't required.

HAProxy is particularly good for rails since you can say each app server can only have 1 request at any time. This makes requests queue at the HAProxy layer, so if an app server has a request that takes a extra long time you don't have requests waiting for that app server to finish - instead you wait for the next available app server in a FIFO queue.

Combining HAProxy with munin gives great stats for tuning your system - whereas just nginx with the patch had no visibility into where bottlenecks might be.

I

[+] whyleyc|18 years ago|reply

Thanks - I got a basic Nginx config up and running on EC2 within a couple of hours. I'm intrigued by the idea of using it in combination with HAProxy though, so wondered if you didn't mind a few follow-up questions ? Essentially:

- Do you run the two products on the same physical machine ?

- What does Nginx do that HAProxy doesn't ? (i.e. why not just stick with HAProxy ?)

- I noticed that Nginx has some weighting options for load balancing (see http://wiki.codemongers.com/NginxLoadBalanceExample). Are these just not sophisticated enough for your needs ?

[+] jbyers|18 years ago|reply

haproxy and nginx are the best of the lot. Consider also ipvs for layer 4 load balancing.

One question that will help decide how things are structured is SSL. If you need it, putting haproxy up front gives you a bit of trouble -- haproxy doesn't support SSL (though it will blindly forward SSL at layer 4 if you want). So you'd need stunnel, nginx, Apache, or something else up front to decode.

[+] photomatt|18 years ago|reply

Recently switch all of the WordPress.com load balancers to be nginx. They push a little over a gigabit of traffic right now and about half a billion requests per day, no sweat. We use Spread + Wackamole for failover, there's more info on Barry's blog:

http://barry.wordpress.com/2008/04/28/load-balancer-update/

I wouldn't recommend DNS round robin for load balancing. (We did it for a while, many problems and flaws in the approach.)

[+] whatusername|18 years ago|reply

I love how you casually mention "Half a billion requests per day"..

[+] samueladam|18 years ago|reply

http://blog.emmettshear.com/post/2008/03/03/Dont-use-Pound-f...

http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-b...

[+] thingsilearned|18 years ago|reply

We use HAProxy because we're very session based (each user sees an entirely different thing) and HAproxy was a good choice for that. I wrote a post on setting it up a few weeks ago.

http://leavingcorporate.com/2008/03/03/session-based-load-ba...

[+] brianr|18 years ago|reply

I'm using nginx in a couple different setups:

  nginx -> paste (pylons)

nginx on one machine, three other machines with 8 instances of paste each. Nginx proxies directly to the paste port (which incidentally is also itself a threaded server, but I've gotten best results by running several instance per box). Volume has been as high as ~8mm dynamic requests/day.

  nginx -> lighttpd -> php-fcgi

nginx on its own box proxying to 8 app servers each running 160 php-fcgi instances. Volume here is ~16mm dynamic requests/day.

Both have worked pretty well so far. As anortherjesse said, there's not a lot of feedback, but it's done everything I need so far.

[+] DenisM|18 years ago|reply

Consider also using multiple DNS A records - the selection would be random thus balancing the load. For example, do "nslookup google.com"

[+] crescendo|18 years ago|reply

Don't you forfeit too much control this way? For example, the load on your servers would be determined by the caching behaviors of all the various DNS servers and clients out there. I think this scheme should only be used as a front line that leads to another layer of load balancers.

[+] jbyers|18 years ago|reply

Multiple A records are OK, but used alone - i.e. not in combination with another load balancing strategy - they can be troublesome. A few years ago, we ran a fair volume of traffic (hundreds of req/s) across three A records for a site. The distribution of requests was simply not uniform, and when a box failed -- even with low TTLs -- many, many clients just plodded away for hours or days with the old IP.

If you do go the multiple A record route, do so in combination with one of the load balancers you asked about, and ideally with IP failover. But probably this is a lot more complexity than you want.

[+] jamess|18 years ago|reply

+1. Round robin DNS has worked well for me in the past,

[+] SwellJoe|18 years ago|reply

I've used pen, Squid, and LVS. All useful for different situations, though LVS is just not practical for the vast majority of situations.

PerlBal looks really cool, as being written in Perl means it has some of the same kinds of flexibility that Squid has (a good reason for Squid is that you can write your own balancing algorithm in any language you like in a redirector script--I always used Perl, or Python when I was working with the Zope guys--so, you can actually do crazy stuff like choose the right server based on keeping them "primed" for the content users are asking for based on URL, or you can use destination URL hashing and achieve the same effect even if you have millions of URLs). Squid also has experimental support for ESI (Edge Side Includes) which is pretty awesome...build a page from disparate and wholly unrelated servers using a simple templating system, and caching them. I don't think any other Open Source product out there has ESI (experimental or otherwise).

[+] drusenko|18 years ago|reply

How is LVS not practical? LVS is awesome... it's kernel-level and doesn't use any resources, plus is very simple but flexible. Forget the ultramonkey configurations and go with keepalived to handle the monitoring, failover, etc.

[+] rcoder|18 years ago|reply

I've been using Apache 2.2 with mod_proxy_balancer to do load balancing for PHP and Rails apps for over a year now, and had pretty good experiences with it. Since I support a lot of existing Apache servers, the configuration is easy for me to work with, and the ability to do authn/authz and SSL termination at the load balancer lets me keep the load down on my application servers.

It's probably not as fast as Nginx, but I haven't found our load balancers to be a bottleneck. In fact, we've been doing load balancing on a pair of really, really wimpy servers (Celeron CPU and 512MB of RAM) running Apache on OpenBSD for about a half-dozen different apps for the last year, and never seen the average load climb up over about 0.5, even while handling upwards of 300 requests per second.

[+] blader|18 years ago|reply

Nginx: http://highscalability.com/friends-sale-architecture-300-mil...

[+] swombat|18 years ago|reply

We're hosted at EngineYard, which use nginx with a fair load balancer plugin that ensures that new requests are assigned to free mongrel instances (yeah, it's RoR).

Works great.

Daniel

[+] anotherjesse|18 years ago|reply

EngineYard uses nginx at the slice level and LVS at the site level (to balance between your slices) - this is for my startup (not userscripts.org which is at serverbeach)

[+] holygoat|18 years ago|reply

I like Pound very much: it's simple and robust.

However, I recently noticed a memory leak. We use healthchecks on our production machines, which means a consistent rate of hits every few seconds, 24/7. After about 3 months, Pound had chewed up 1.7GB of RAM, which caused memory usage alerts in our monitors.

Not a big deal -- you can always restart the process -- but I'm still evaluating alternatives.

[+] jrockway|18 years ago|reply

I use perlbal, which is nice-n-simple. Add a few lines describing where your backend servers are to the config file, save it, and start perlbal.

For our $WORK applications, we just have an apache that proxies to the backend FastCGI apps. We don't have a ton of load, so that works fine. (We might be switching to nginx, which is much simpler than Apache for this use case.)

[+] subwindow|18 years ago|reply

I'm using Pound right now in one large installation, and nginx in another smaller installation, but it still has decent traffic.

I definitely prefer Nginx. It seems much faster, and has definitely given fewer headaches. It seems like an issue with Pound crops up every few weeks. My Pound config file is about 600 lines long now, and it is starting to get unmaintainable.

[+] mattculbreth|18 years ago|reply

I´ve used nginx in Rails and Pylons environments with good success. These apps don´t get much traffic so that wasn´t a consideration, but the ease and simplicity of nginx is fantastic. You never have to do anything with it once it's initially configured.

[+] azsromej|18 years ago|reply

I've used nginx and have been able to adapt old htaccess rules to get everything I used to get with apache. It's good on memory and I've never had a problem.

[+] andy|18 years ago|reply

Why use software? I have a hardware load balancer at Softlayer and for $99 bucks a month it's totally worth it.

[+] modoc|18 years ago|reply

Because Softlayer (as much as they rock) don't have failover for those load balancers. You may not need it, but after we had our site downed due to hardware failure of one of their load balancers, we moved to a redundant software based setup.

We used HAProxy and Heartbeat2 to provide fully redundant load balancers. On our smaller sites, we actually run the HAProxy and Heartbeat2 on two of the web servers for that cluster, so you don't need dedicated hardware if you don't want it. If you do this, and you're on softlayer, I'd recommend sending the back-channel traffic along the internal network to avoid using 2X your real bandwidth.

You can read about how to set it up here:

http://www.digitalsanctuary.com/tech-blog/debian/13-steps-to...

[+] merrick33|18 years ago|reply

ultramonkey in front of apache / php / postgresql

It was very smooth to setup with debian, but my first experience setting it up was with redhat and that was tortuous

[+] jdavid|18 years ago|reply

what is ultramonkey like? i saw that a while back, but it looked like the project was no longer being updated.

37 comments