(no title)
bt848
|
6 years ago
Skylake-X has 44 PCI 3.0 lanes, that's 352GT/s or about 345gbps application bandwidth. It's certainly more than enough to push 100gbps from disk to net. These guys are pushing 200gbps, but they're doing it with two CPUs, two sets of NVMe devices, and two NICs, and a bunch of hacks to make the operating system pretend all this stuff is not in the same box. It seems way more straight-forward to me if they had made it all be actually NOT in the same box!
loeg|6 years ago
We're in total agreement :-). Their dataflow model requires something like 2x that in PCIe bandwidth and 4x in memory in the optimal case, as covered in the slides. 2x200 gbps = 400 gbps, which is a bit more than 345 gbps.
Maybe they could push 345/2 = 172 Gbps out of a single Skylake-X, best case. For some workloads, that might be the right local optima! They must have decided that the marginal cost of a 2P system was worth the extra ~25 Gbps to saturate the 200 Gbps pipe fully.
> they're doing it with two CPUs, two sets of NVMe devices, and two NICs, and a bunch of hacks to make the operating system pretend all this stuff is not in the same box. It seems way more straight-forward to me if they had made it all be actually NOT in the same box!
I've spoken with NFLX engineers in the past and my recollection is that in many installations, NFLX only get to install one box. (Or something like that. Might just be a cost thing.) So they need to make that one box fast.
I guess the other factor is the IP management overhead discussed in the slides. Two boxes necessitates the costly 2nd IP, as far as I know. It's hard to imagine the cost of an IP address dominating the marginal cost of a 2P socket system and 2nd Xeon, but I guess AWS is friggin expensive.