top | item 15718434

FreeBSD/EC2 on C5 instances

162 points| dantiberian | 8 years ago |daemonology.net | reply

44 comments

[+] STRML|8 years ago|reply

While the new C5 instances are certainly welcome - I've been hoping for their release since their announcement in November 2016 (and they were already late for Skylake at that point) - we have encountered a number of show-stopping problems that point to this project being just a bit too ambitious.

To name a few:

1. EBS volumes attached to C5 instances show completely bogus CloudWatch metrics, over an order of magnitude higher than reality (e.g. average read/write latency prints at 100-60,000ms depending on load)

2. C5 instances don't work - at all - behind an NLB with a Target Group pointing to it as an "instance". You have to put it in "IP" mode.

3. OpsWorks, as always, lags way behind AWS offerings. You can't launch C5 instances. This was true of even R4 instances for a while, but you could at least change them via API. Not so with the C5 instances; unless you want to lose track of their type completely, you just have to abstain for now.

3a. As a result, we have to run R4 instances for some of our web tier - despite not needing the memory - because they have the highest network allocation. To make matters worse, AWS won't tell you the network allocation. You don't know until you start dropping packets.

4. ZFS on C5 instances can behave strangely. We've been unable to resize drives (zpool online -e <pool> <drive>) if they're identified by ID ("unable to read disk capacity"). Moving the instance back to any other type fixes the issue.

As always, you expect a couple of quirks with a new architecture, but I found myself wishing they had just stuck a new board with a Skylake chip into a rack and launched it.

Compared to this, GCE has a far more attractive offering: even ignoring all these issues, we simply can't get the instance size we want (a few fast CPUs + lots of memory). It just doesn't exist.

[+] toast0|8 years ago|reply

> 3a. As a result, we have to run R4 instances for some of our web tier - despite not needing the memory - because they have the highest network allocation. To make matters worse, AWS won't tell you the network allocation. You don't know until you start dropping packets.

Ugh, I hate that. They also have hidden limits on the number of incoming tcp connections you can have.

[+] mbell|8 years ago|reply

> I've been hoping for their release since their announcement in November 2016 (and they were already late for Skylake at that point)

I don't know what you mean by 'late'. Skylake Xeons were delayed and have only been released recently, at least partially. You may be able to get them from vendors like Dell but you still can't go out and buy one anywhere that I'm aware of.

[+] _msw_|8 years ago|reply

Does the updated documentation at http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitorin... explain the CloudWatch metrics?

[+] joemag|8 years ago|reply

#2 is expected to work. Can you reach out to me directly (joemag@) and I will make sure we take a look.

[+] dastbe|8 years ago|reply

So I wouldn't say C5s don't work at all with NLB instance target groups - I'm running one right now.

I wouldn't be surprised if you're hitting an edge case with the tighter integration* between NLB's and instance associations. If you haven't already, please do reach out to support.

* from the docs "If you specify targets using an instance ID, the source IP addresses of the clients are preserved and provided to your applications. If you specify targets by IP address, the source IP addresses are the private IP addresses of the load balancer nodes."

[+] tty7|8 years ago|reply

You have given me a whole other perspective to an aws customer. I spend a lot on aws (70k/m +) but id be happy if i had core2duo cpus!

What sort of work do you have that requires the latest generation ? Or more why do you want the latest ?

Id expect aws to always be behind the ball - are they the right platform for you?

Gce is interesting, im moving half of my infra over there - but again its not really about their hardware offerings. Its more about being multi cloud/redundant

[+] kijiki|8 years ago|reply

> YOU'RE BUILDING A HARDWARE FRONT-END TO EBS? You guys are insane!

It seems more likely that they've put in a software device model of NVME as a replacement for the BlkBack software device model that the BlkFront driver talked to. Not much different than the software e1000 NIC that xen/qemu already supports.

That said, with their virtualizable Annapurna wonder-NIC, they could be doing it in "hardware", though even in that case a reasonable part of the device model would be software, just running on a NPU, not a CPU.

Hopefully Amazon will disclose more details.

[+] cperciva|8 years ago|reply

There's absolutely no way that they would get the performance I'm seeing from an emulated disk. We're talking to real hardware, exposed via PCI passthrough.

Now, exactly what form that hardware takes is an open question. I would assume it's something like "NVME interface hardware" + "ARM CPU which implements the EBS protocol" + "25 GbE PHY", but that guess is based solely on "that's how I would design it".

[+] nomadlogic|8 years ago|reply

NVMe supports SR-IOV much in the same way that NICs do - which i suspect is how AWS is delivering "physical" NICs to VMs currently. So its a pretty safe bet that this is how NVMe devices are being delivered to guest Vm's.

[+] aliguori|8 years ago|reply

> Hopefully Amazon will disclose more details.

We will have some more details on how this all works at re:Invent in a couple weeks.

[+] X86BSD|8 years ago|reply

Colin is one of the many smart wizards we are lucky to have in FreeBSD land. Well done Colin!

[+] cperciva|8 years ago|reply

I can't claim much credit here. I haven't made anything work; all I did was figure out what didn't work and let the right people know.

[+] tedunangst|8 years ago|reply

What kind of device does freebsd hang off nvme? Is it da or something not cam? Haven't really been paying attention.

[+] cperciva|8 years ago|reply

It's nvme. The GEOM disks (which are what you want to use) show up as /dev/nvd#.

[+] nomadlogic|8 years ago|reply

thus spoke the manual nvme(4): "The nvme driver creates controller device nodes in the format /dev/nvmeX and namespace device nodes in the format /dev/nvmeXnsY."

it's been a while since i've had access to nvme gear but it "just worked" at the time - although my use case was for a a daemon that accessed the block device directly to do its own horrible things to it.