top | item 15196523

AWS Network Load Balancer

287 points| jeffbarr | 8 years ago |aws.amazon.com

118 comments

order
[+] colmmacc|8 years ago|reply
If you're curious to see NLB in action, here's a live demo: http://nlb-34dc3b430638dc3e.elb.us-west-2.amazonaws.com/ , it took about 5 minutes in the console to set it up and no changes on the targets/backends.

Massive disclaimer: I work on NLB.

[+] posnet|8 years ago|reply
Are there any plans to add UDP support?
[+] knite|8 years ago|reply
It sounds like NLB passes through source IP - does that mean outbound flows are through the IGW?
[+] irl_zebra|8 years ago|reply
Any idea how long for GovCloud?
[+] mdouglass|8 years ago|reply
Is there any intent to add TLS termination? That’s a dealbreaker for us switching from the classic load balancer. Otherwise this looks really awesome, thanks!
[+] stelund|8 years ago|reply
There is no Security Group for NLB, how is that reasoned?
[+] friendzis|8 years ago|reply
Demo page states "Your browser may keep a connection open for a few seconds and re-use it for a reloaded request. If it does, you'll get the same target", but when I attempted to abuse the power of F5, I was alternated between ice cream and bumblebee.

If you are going to look at it, attempt time - ~04:50 UTC, remote address from 88.119.128.0/20 network

[+] chii|8 years ago|reply
is there a cloudformation template for this demo that i can look at?
[+] mooreds|8 years ago|reply
I love the concept, because not being able to handle TCP traffic was one shortcoming of the new ALB.

But that pricing model:

    Bandwidth – 1 GB per LCU.
    New Connections – 800 per LCU.
    Active Connections – 100,000 per LCU.
Would be nice to have it added to the simple monthly calculator: https://calculator.s3.amazonaws.com/index.html but I had to read the FAQ to find out what those were: https://aws.amazon.com/elasticloadbalancing/faqs/
[+] runako|8 years ago|reply
It looks like the deep linking to the LCU page doesn't work (you have to click the tab for Network Load Balancer), so here's what an LCU is from that page:

---

An LCU is a new metric for determining how you pay for a Network Load Balancer. An LCU defines the maximum resource consumed in any one of the dimensions (new connections/flows, active connections/flows, and bandwidth) the Network Load Balancer processes your traffic.

[+] djhworld|8 years ago|reply
I often feel the simple monthly calculator is largely neglected by AWS.

A large number of services are not even featured on the calculator as an option (e.g. Lambda)

[+] syncerr|8 years ago|reply
Seems to remarkably decrease latency (380ms -> 109ms). Running some tests:

    # ab -n 400 http://nlb-34dc3b430638dc3e.elb.us-west-2.amazonaws.com/
    Time per request: 108.779 [ms] (mean, across all concurrent requests)

    # ab -n 400 <public server via ELB>
    381.933

    # ab -n 400 <public server via ALB>
    380.632

    # (for reference) ab -n 400 https://www.google.com/
    190.536

    # (for reference) ab -n 400 https://sandbox-api.uber.com/health/
    107.680
If you're wiling to terminate SSL, this looks like it could be a solid improvement.
[+] geocar|8 years ago|reply
I've seen a similar improvement using the EnableProxyProtocol policy, which required a bit of code:

    Time per request:       88.400 [ms] (mean, across all concurrent requests)
versus public server via the regular HTTP proxy:

    Time per request:       415.859 [ms] (mean, across all concurrent requests)
For reference:

    $ ab -n 400 https://www.google.com/
    Time per request:       168.438 [ms] (mean, across all concurrent requests)
[+] paulddraper|8 years ago|reply
Static IP, source IP, and zonality are game changing.

Unfortunately, it lacks a very significant existing feature of ELB: SSL/TLS termination. It's very convient to manage the certs in AWS without having to deploy them to dedicated EC2 instances.

[+] jacobr1|8 years ago|reply
This is what their ALB service if for.
[+] 9point6|8 years ago|reply
It won't ever be possible to do this as the NLB runs a few network layers below where TLS runs
[+] g09980|8 years ago|reply
This was discussed a couple of times recently, but answers seemed contradictory. ELB requires pre-warming if you expect sudden high load. But do ALB and NLB?

[1] https://news.ycombinator.com/item?id=15085863

[2] https://news.ycombinator.com/item?id=14052079

[+] colmmacc|8 years ago|reply
Each NLB starts out with several gigabits of capacity per availability zone, and it scales horizontally from there (theoretically to Terabits). That's more capacity than many of the busiest web-sites and web-services in the world need.

If you expect an instantaneous load of more than about 5Gbit/sec, in those situations we work directly with customers via AWS Support. We really try to understand the load, make sure that the right mechanisms are in place. At that scale, our internal DDOS mitigation systems also come into play. (It's not a constraint of NLB).

The load test in the blog post was done with an NLB, and was done with no pre-provisioning or pre-warming and allowed us to get to 3M RPS and 30Gbit/sec, which is when we exhausted the capacity of our test backends.

ALBs start out with less capacity, and are constrained more by requests than bandwidth. I don't have a precise number because it depends on how many rules you have configured and which TLS ciphers are negotiated by your clients, but the numbers are high enough that customers routinely use ALBs to handle real-world spiky workloads, including supporting Super Bowl ads and flash sales.

Each ALB can scale into the tens of gigabits/sec before needing to shard. ALB also has a neat trick up its sleeve: if you add backends, we scale up, even if there's no traffic. We assume the backends are there to handle expected load. So in that case it has "more" capacity than the backends behind it. That goes a long way to avoiding some of the scaling issues that impacted ELB early in its history.

If you have a workload that you're worried about, feel free to reach out to me and we'll be happy to work with you. colm _AT_ amazon.com.

[+] discodave|8 years ago|reply
Not sure about ALB, but from the linked blog post for NLB:

> Beginning at 1.5 million requests per second, they quickly turned the dial all the way up, reaching over 3 million requests per second and 30 Gbps of aggregate bandwidth before maxing out their test resources.

[+] uji|8 years ago|reply
Seems like this limitation has been for long time (since 2009). Curious to know how everyone has been using ELB. To me, ELB seems to be an unfinished product. It must be painful to first predict heavy load on your application and then notify AWS well in advance.
[+] deafcalculus|8 years ago|reply
How does failover across zones work?

The blog post says there's one static ip per zone. I suppose www.mydomain should have multiple A records each pointing to an elastic ip in a zone. What happens when one zone entirely fails? Does it need a DNS change at this point? Or does the NLB have a different IP with which it can do BGP failover?

[+] krallin|8 years ago|reply
AWS provides you with a number of DNS records for each NLB:

- One record per zone (which maps to the EIP for that zone) - A top-level record that includes all active zones (these are all zones you have registered targets in, IIRC)

The latter record is health checked, so if an AZ goes down, it'll stop advertising it automatically (there will be latency of course, so you'll have some clients connecting to a dead IP, but if we're talking unplanned AZ failure, that's sort of expected).

That said, this does mean you probably shouldn't advertise the IPs directly if you can avoid it, yes.

(disclaimer: we evaluated NLB during their beta, so some of this information might be slightly outdated / inaccurate)

[+] jon-wood|8 years ago|reply
I assume they intend for you to use Route53 on top of this. You could use a combination of geolocation routing and failovers to set it up so that by default people are routed to their nearest region, but if that region is currently offline send them somewhere else instead.
[+] gregmac|8 years ago|reply
I have just finished setting up a new front-end for a few services (we are just about to start migrating production systems to it).

I was aiming to use static IPs (for client firewall rules), and simplify networking configuration, so what I ended up with is an auto-scaling group of HAProxy systems that run a script every couple of minutes to assign themselves an elastic IP from a provided list. Route 53 is configured with health checks to only return the IP(s) that are working.

The HAProxy instances also continuously read their target auto-scaling groups to update backend config, and do ssl terminating, also running the Let's Encrypt client. Most services are routed by host name, but a couple older ones are path-based and there are some 301 redirects.

I think NLB could replace the elastic IP and route53 part of this setup, but I'd still need to do SSL, routing, and backends. It's too bad, because my setup is one that could be used nearly anywhere that has more than one public-facing service, but there's not much built-in to help - I had to write quite a few scripts to get everything I needed.

[+] gnur|8 years ago|reply
Have you tried/evaluated traefik? It sounds like it could do nearly everything you just mentioned.
[+] tjholowaychuk|8 years ago|reply
Awesome! I have the perfect project for this haha. Does it still work with ECS integration?
[+] Thaxll|8 years ago|reply
Yes it does ( target group integration )
[+] GeneticGenesis|8 years ago|reply
No chance that I'll jump into another new load balancer product from Amazon any time soon. ALB has significant deficiencies that AWS don't warn you about, and you only find then at tens of thousands of RPS.

Still waiting on that fix, AWS.

[+] newhere420|8 years ago|reply
If they won't warn us, could you please warn us? - Fellow ALB user.
[+] dfischer|8 years ago|reply
This is perfect. Need something like this to load balance A records for dynamic domains off apex. We can most likely use the static IP address perfectly for this.

Also did I read it wrong or this is actually cheaper than ALB?

Awesome! Can't wait to dig in more!

[+] davidbrownct|8 years ago|reply
Yes, NLB is priced the same as ALB hourly but 25% cheaper on bandwidth (LCUs).
[+] mark242|8 years ago|reply
I don't understand the pricing model. 800 new connections per hour, for $0.006? Isn't that extremely expensive? 80,000 connections for $0.60 in an hour is $432 per month for not a whole lot of traffic.

edit: Okay, it's 800 new connections per second, per the ELB pricing page, under "LCU details". The cost for 80k connections in an hour is effectively constrained by the bandwidth, eg if there's very low bandwidth it's $0.006/hour or $4.32/month.

[+] kadiyala|8 years ago|reply
I was just wondering if this is something purely developed inside amazon or is it backed by an ADC like NetScaler or F5. does anyone know any detail ? I'm assuming that classic load balancer is some third-party or old framework and this is something amazon developed internally.
[+] Corrado|8 years ago|reply
I would assume that it's something developed internally at Amazon. Networking inside of AWS isn't standard fare and I doubt something like NetScaler or F5 products would be able to be used. Generally speaking, they aren't using TCP/IP behind the curtain, to move packets between nodes. AWS has even created their own routing hardware/software because no other company could do what they need at the scale that they need. See this video for more information: https://www.youtube.com/watch?v=St3SE4LWhKo
[+] irl_zebra|8 years ago|reply
It's not showing up in GovCloud. This stuff always takes longer for that.
[+] inertial|8 years ago|reply
Feature request : Please allow weighted load balancing i.e. ability to distribute traffic in a user specified ratio (weights) to different sized instances.
[+] colmmacc|8 years ago|reply
For now here's a work-around that I use: create multiple listeners/ports on the larger instances and add them as targets. Containers is a great approach here too; load up the bigger instances with more containers and register each container as the targets.
[+] bashtoni|8 years ago|reply
Better: distribute to the instance with the lowest % CPU by default, ala the Google Cloud NLB.
[+] shampster|8 years ago|reply
I was so excited before I realized this wouldn't terminate http(s) traffic. An IP anycast based load balancer ALB would be nice.
[+] Cieplak|8 years ago|reply
I wonder what they wrote it in. I'd guess C++, Java or Erlang, or a combination of those.
[+] sgs1370|8 years ago|reply
This will motivate me to get everything into VPCs, which I should have done a while ago.
[+] hexsprite|8 years ago|reply
so would a websocket-based application be better off using NLB?