top | item 13812406

(no title)

throwawayish | 9 years ago

I think Naples is a very exciting development, because:

- 1S/2S is obviously where the pie is. Few servers are 4S.

- 8 DDR4 channels per socket is twice the memory bandwidth of 2011, and still more than LGA-36712312whateverthenumberwas

- First x86 server platform with SHA1/2 acceleration

- 128 PCIe lanes in a 1S system is unprecedented

All in all Naples seems like a very interesting platform for throughput-intensive applications. Overall it seems that Sun with it's Niagara-approach (massive number of threads, lots of I/O on-chip) was just a few years too early (and likely a few thousands / system to expensive ;)

discuss

order

semi-extrinsic|9 years ago

> 128 PCIe lanes in a 1S system is unprecedented

Yes, definitely drooling at this. Assuming a workload that doesn't eat too much CPU, this would make for a relatively cheap and hassle-free non-blocking 8 GPU @ 16x PCIe workstation. I wants one.

sorenjan|9 years ago

That does sound pretty spectacular, and really loud. What kind of case would you put that in? Would you work with ear protection?

AnthonyMouse|9 years ago

> 8 DDR4 channels per socket is twice the memory bandwidth of 2011, and still more than LGA-36712312whateverthenumberwas

This one will be interesting. The current Ryzen (like most of the Intel desktop range) has two channels, but everyone has been benchmarking it against the i7-6900K because they both have eight cores. The i7-6900K is the workstation LGA 2011 with four channels. If the workstation Ryzen will have eight channels...

gigatexal|9 years ago

Let's hope this isn't niagra again: it needs to have decent clock speeds as IPC is still worth something today. But yes, I totally agree, this is an exciting chip.

binarycrusader|9 years ago

It's not, not only did AMD move from CMT (clustered multi-thread) design used in the previous Bulldozer microarchitecture, they now have an SMT (simultaneous multithreading) architecture allowing for 2 threads per core.

By comparison, the performance of sparc substantially improved moving from the T1, T2 to T3+. The T1 used a round-robin policy to issue instructions from the next active thread each cycle, supporting up to 8 fine-grained threads in total. That made it more like a barrel processor.

Starting with the T3, two of the threads could be executed simultaneously. Then, starting with the T4, sparc added dynamic threading and out-of-order execution. Later versions are even faster and clock speeds have also risen considerably.

alimbada|9 years ago

Naples is based on Ryzen which, if you look at early benchmarks, is beating the competition on all fronts except gaming (suspectedly due to software optimisation and motherboard issues).

tyingq|9 years ago

32 cores in one socket may also take a bite out of some servers that are currently 2 sockets.

agumonkey|9 years ago

My shallow understanding of big servers and IBM Z series amounted to "lots of dedicated IO processors". Seems like "mainstream" caught up with big blue.

kev009|9 years ago

Sort of. It ebbs and flows, generally more maintainable to do more in CPU/kernel and less in HW/firmware for PCs and of course price runs the market so there's a race to do less. Part of the mainframe price tag is getting long term support on the whole system stack, whereas PC vendors actively abandon stuff after a few years. That is a big risk for something like TCP offload engine.

Every mainframe interface is basically an offload interface.. "computers" DMAing and processing to the CPs and each other. Every I/O device has a command processor, so it can handle channel errors and integrated pcie errors in a way PCs cannot.

A PC with Chelsio NICs doing TCP offload with direct data placement or RDMA as well as Fiber Channel storage would be mini/mainframe-ish.

PeCaN|9 years ago

Pretty much. Mainframes have been very I/O oriented from the start. Channel I/O (more or less DMA) with dedicated channel programs and processors can be very high-throughput.

mtgx|9 years ago

Intel doesn't have SHA2 acceleration? ARMv8 has had it for like 2-3 years now...

And AMD should dump SHA1 acceleration in the next generation.

drzaiusapelord|9 years ago

>And AMD should dump SHA1 acceleration in the next generation.

The cost to have that on silicon is probably close to zero. If you think SHA1 is just going to magically disappear because you want it to, well, you'll be in for a SHA1 sized surprise. Our grandkids will still have SHA1 acceleration.

>ARMv8 has had it for like 2-3 years now...

Because ARM cores don't remotely have the CPU heft an Intel x86/64 chip has, so ARM needs all this acceleration because its typically used in very low power mobile scenarios. On top of that, Intel claims AES-NI can be used to accelerate SHA1.

https://software.intel.com/en-us/articles/improving-the-perf...

throwawayish|9 years ago

ARM cores are much weaker, crypto performance without NEON is absymal across the board. Of course, compared to hardware-acceleration software always seems slow; Haswell manages AES-OCB at <1 cpb.

yuhong|9 years ago

As a side note, XOP had rotate instructions. Sadly it is no longer supported in Ryzen.

tw04|9 years ago

Intel hass had SHA1/2 acceleration for YEARS via the AES-NI instruction set.

https://en.wikipedia.org/wiki/Intel_SHA_extensions

>There are seven new SSE-based instructions, four supporting SHA-1 and three for SHA-256:

>SHA1RNDS4, SHA1NEXTE, SHA1MSG1, SHA1MSG2, SHA256RNDS2, SHA256MSG1, SHA256MSG2

throwawayish|9 years ago

This is not part of AES-NI and has never been released in a mid-range+ server/desktop CPU, only part of some Atom parts (Goldmont). Therefore software support is poor (I think OpenSSL does not support it). It is said to be included in 2018+ Cannonlake, though.