The Zen core had a pretty good (but no that amazing) success in the consumer market, because although people admired its multi-threaded performance, they were merely lukewarm about its single-threaded perf which "just" almost matches Intel. But oh boy, the Zen core in the server market is going to make a killing. Servers are all about multi-threaded performance (hence why 80% of the server market is dual socket). And it looks like a single socket EPYC is beating a dual socket Xeon... ouch. Finally a good kick in Intel's resting bottom.
> The Zen core had a pretty good (but no that amazing) success in the consumer market, because although people admired its multi-threaded performance, they were merely lukewarm about its single-threaded perf which "just" almost matches Intel.
That's highly variable, I was actually pretty astounded by it's single core performance and completely blown away by its multicore performance.
I built a 1700 (not even a +) for work a few weeks ago and I keep running into things where I pause and think something crashed because it can't have finished that fast...
Yesterdays was 5Gb of mixed data in 18s (turns out the SSD is the bottleneck), if I wasn't busy with a new job I'd be trying that 5Gb of data out of a RAM disk just to see how fast pigz (love pigz by the way, multithreaded gzip) can go with 8 cores/16 threads.
> The Zen core had a pretty good (but no that amazing) success in the consumer market
In the communities I look at, the only Intel processors still getting regularly recommended are the Pentium G4560 and the Intel i7-7700K. And that's from consumers to consumers.
Also, when I look at my meta-benchmark for games, I see Ryzen with really good results by now. That changed a bit, it improved now that games are getting optimized and ram support is getting better. Before the i5 was still more viable.
I don't have insight into the whole market, but from my small observer position Ryzen does look like a pretty huge success.
While the raw compute performance numbers of Zen may be better than Intel, don't underestimate other economic and business factors. AMD has always played 2nd fiddle to Intel and it is hard to shake that perception. I'd love to see a major cloud provider (AWS, Google, Azure) actively buying AMD chips and making them available for compute. However, I am still a bit skeptical this is ever going to happen. There is just too much risk for a cloud provider. Intel holds market and nearly all mindshare in mainstream cloud computing.
I've used a dual cpu AMD Bulldozer for years (and still do), it's been rock solid and the 32 threads really helped with certain workloads. At the time the equivalent from Intel would have been far more expensive.
Well I know what will be in the next stack of servers my company buys (in 20RU chunks). It's all Linux+Docker for dev & test with some KVM. Right now we use 2xCPU 48 core Intel, 2x1G and 2X10G. 1RU form factor holds two of these servers. It's all about thread scale out for us. The more containers we can run per server = faster build and test throughput. Pretty cool AMD. Happy to have you back.
TLDR: Intel's advantage of being able to clock higher gets removed because of heat in these high density multi-core chips. 1p 32core $2k. 2p 32core $4k.
"..14% advantage of cores per rack that ship with their Naples platform compared to Intel’s. On Intel, a singular rack will consist of 4704 cores while AMD’s Zen based Naples Rack will ship with 5376 cores.
There’s also 14% advantage in VM (Virtual Machines) per socket. Memory bandwidth sees a 33% advantage as AMD has 8 channels while Intel’s Purley platform is configured for 6 channels per socket. Intel platform also supports 24 DIMMs while AMD can support up to 32 DIMMs." "release 20th of June."
I just posted this yesterday about Naples/Epyc already having a huge advantage over Xeons for certain server workloads due to support for hardware-assisted SHA calculations: https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring... , already supported by the Linux kernel and many open source crypto libraries.
I honestly had no clue this reveal was right around the corner. These numbers really do give AMD a fighting chance here.
I doubt many server workloads have SHA calculations as a significant bottleneck. Doing SHA256 5x faster (11 cycles/byte to 2 cycles/byte [1]) isn't that groundbreaking.
Surely these processors introduce another new level of NUMA dynamics? Each group of 8 cores has its own memory controller, own PCIe root complex, and then there is a crosslink between each group of 8 cores.
Up until now you would (potentially) have to consider which socket you are on, and where your memory or IO devices (PCIe) are.
Now you have the same considerations within a socket, as well as between sockets?
Previous Opterons were also MCMs with NUMA within the socket, although their performance was poor enough that many people probably never noticed they existed. If Intel makes cluster-on-die mode mandatory then they'll also have NUMA within the socket.
So the Epyc is basically 4 ryzens. So you get 4x the cores, 4x the memory channels, and 4x the pieces of silicon.
So think of a single socket ryzen as a quad socket motherboard. In either case you have clusters of cores/cache connected to memory controllers and hypertransport. For most workloads a NUMA aware kernel does a pretty good job of minimizing hitting pages on other controllers. But it's not a particularly big deal when you miss, typically about 10% (latency and bandwidth).
AMD makes all the I/O pins capable of hypertransport (or whatever they call it now) and PCI-e.
This isn't particularly new btw. MCM (multiple chips per package) go way back to the pentium pro if not before. Intel Xeons are all single chip, but have similar on chip architectures. The 4,6,8 core chips are pretty simple, but the larger core chips have a ring bus for one set of cores, and another ring bus for the second.
But generally for most workloads the NUMA issues related to the newer chips isn't a particularly large hurdle from getting good performance. What I am concerned about though is how good the Epyc floating point is, I fear they are bragging about integer performance and not FP because they are behind on FP.
It's more than that, I think the cores are still in clusters of 4 that talk over infinity fabric.
But there's no avoiding some kind of complex inter-core dynamics at this level. The Intel alternative is a bunch of ring busses that have different speeds to each core from any point. And this design makes every memory access go over the infinity fabric, so latency might be surprisingly even.
Haven't been following this closely, but has the random segfault problem [1] been addressed? I would imagine this a bigger problem for servers almost constantly maxing out all cores and threads compared to a desktop/laptop... Just imagine the horror when you write safe Rust code and get hit hard by heisenbugs in production...
While the full root cause has not yet been found or resolved the limited issues have been pretty reliably resolved by disabling ASLR. Whatever the root cause it is likely the issue can be fixed through BIOS/microcode updates.
The number of people affected are low. My Ryzen machine has only ever run linux and compiles a lot and has never exhibited this behavior. Also, most new platforms have issues, even new server platforms. These will be worked through during substantial validation server OEMs will go through.
Lastly, look up the errata list for any Xeon CPUs. Intel releases microcode updates for them several times a year to fix bugs. Modern CPUs are complex and will pretty much always have bugs. Luckily some combination of BIOS or microcode updates will almost always resolve them.
If the Zen 2 / Rome series brings down the power/heat a bit, that will probably be around the time I'm seriously considering another upgrade... my i7-4790 desktop has been really good to me for a couple years, but within another 2, may be looking around again. Though, the consumer variant will also be up for consideration.
Nice to see AMD competing again, knew they would get some ground in the server space looking at the Zen benchmarks on the consumer CPUs.
The bottom of the article mentions a supposed leak of a 2018 server chip with 48 cores. I wonder if that's just an MCM with 6 chips, made feasible by the smaller die size and reduced thermals?
Kind of curious: How much lower you want them to go on the power/heat metric? Your current CPU is an 84W TDP part. A Ryzen 7 1700 has twice the number of cores and draws 65W with stock settings. The 1700 can get hot, but really only if you overclock it.
Of course more performance per watt is always better and I too hope Zen 2 can be even better. But right now, in certain scenarios, Ryzen is already leading in both performance per watt and per dollar.
Remember the TDP for AMD is for the whole SoC. While with Intel you should include the PCH as well. At any rate you should compare the whole system with exact same components apart from the CPU. And the difference is minimal.
There's going to be a hit to I/O when one of those modules needs to access RAM that's on a different module's controller. The new bus seems to be really fast though, so I'm not sure how many you get have before it's a real problem.
Supposedly the big innovation of the Zen platform is its interconnect tech managing to scale performance nearly linearly with more chips. Or so AMD claims.
I'd guess external IO (memory and PCIe) might be a problem. How are you going to route all those wires out from the socket?
Secondly, there's probably some economic argument as well. Too few customers willing to pay for a humongous MCM, and with the attendant wiring complexity requiring more layers for the motherboard, it might be cheaper to go to more sockets instead?
It's interesting that the EPYC 7601 comes at a significant premium to the EPYC 7551P - double the price for a 200MHz base clock increase and dual-socket support. Question for those more knowledgeable on data centres - is the modest performance gain and increased density worth it for the cost?
You don't buy a CPU alone. You buy a whole server. If the CPU increases 1000$ in price to gain 10% performance but the complete server already is 20000$, it's quite worth it to go from 20000$ to 21000$ for 10% of performance.
Also if you factor in a performance critical software for 100000$ that runs on that single node, you will buy the fastest hardware that you can get for a few thousand dollars more.
It depends on so many factors (hosting costs, design/availability of motherboards/servers, workload easily parallelizable across multiple servers, etc)
The differences between the 7601, 7551 and 7501 stuck out at me as interesting. The 7601 is at least $1300 more expensive for 200MHz more clock and 10W/25W (???) more TDP. Is that an artificial distinction, or do such high-TDP chips have a worse yield?
If there's a reason for it, I would have to guess the marketing team assumes that since many millennials are now in the position to make corporate buying decisions, the combined irony and nostalgia will entice them to look more deeply into the new server architecture? Personally, I could care less what the name is, and I'm sure the majority that will be buying these feel the same as I do. Performance and price is all that counts, and AMD has come out with something pretty remarkable in that respect. It's exciting to see Intel actually being caught with their pants down for once.
The marketing might seem silly, but you want to hit the people who actually build your data centres with the awe aspect. They're the ones who make the recommendations when CTOs want to expand, and even if they think the name is stupid, it's something they remember when building those cost to performance charts/spreadsheets for expansions.
[+] [-] mrb|8 years ago|reply
[+] [-] noir_lord|8 years ago|reply
That's highly variable, I was actually pretty astounded by it's single core performance and completely blown away by its multicore performance.
I built a 1700 (not even a +) for work a few weeks ago and I keep running into things where I pause and think something crashed because it can't have finished that fast...
Yesterdays was 5Gb of mixed data in 18s (turns out the SSD is the bottleneck), if I wasn't busy with a new job I'd be trying that 5Gb of data out of a RAM disk just to see how fast pigz (love pigz by the way, multithreaded gzip) can go with 8 cores/16 threads.
[+] [-] onli|8 years ago|reply
In the communities I look at, the only Intel processors still getting regularly recommended are the Pentium G4560 and the Intel i7-7700K. And that's from consumers to consumers.
Also, when I look at my meta-benchmark for games, I see Ryzen with really good results by now. That changed a bit, it improved now that games are getting optimized and ram support is getting better. Before the i5 was still more viable.
I don't have insight into the whole market, but from my small observer position Ryzen does look like a pretty huge success.
[+] [-] nodesocket|8 years ago|reply
Disclosure: $AMD shareholder
[+] [-] Zenst|8 years ago|reply
[+] [-] jacquesm|8 years ago|reply
[+] [-] myrandomcomment|8 years ago|reply
[+] [-] RichardHeart|8 years ago|reply
"..14% advantage of cores per rack that ship with their Naples platform compared to Intel’s. On Intel, a singular rack will consist of 4704 cores while AMD’s Zen based Naples Rack will ship with 5376 cores.
There’s also 14% advantage in VM (Virtual Machines) per socket. Memory bandwidth sees a 33% advantage as AMD has 8 channels while Intel’s Purley platform is configured for 6 channels per socket. Intel platform also supports 24 DIMMs while AMD can support up to 32 DIMMs." "release 20th of June."
[+] [-] ComputerGuru|8 years ago|reply
I honestly had no clue this reveal was right around the corner. These numbers really do give AMD a fighting chance here.
[+] [-] Scaevolus|8 years ago|reply
[1]: https://bench.cr.yp.to/results-hash.html
[+] [-] jamesfmilne|8 years ago|reply
Up until now you would (potentially) have to consider which socket you are on, and where your memory or IO devices (PCIe) are.
Now you have the same considerations within a socket, as well as between sockets?
[+] [-] wmf|8 years ago|reply
[+] [-] sliken|8 years ago|reply
So the Epyc is basically 4 ryzens. So you get 4x the cores, 4x the memory channels, and 4x the pieces of silicon.
So think of a single socket ryzen as a quad socket motherboard. In either case you have clusters of cores/cache connected to memory controllers and hypertransport. For most workloads a NUMA aware kernel does a pretty good job of minimizing hitting pages on other controllers. But it's not a particularly big deal when you miss, typically about 10% (latency and bandwidth).
AMD makes all the I/O pins capable of hypertransport (or whatever they call it now) and PCI-e.
This isn't particularly new btw. MCM (multiple chips per package) go way back to the pentium pro if not before. Intel Xeons are all single chip, but have similar on chip architectures. The 4,6,8 core chips are pretty simple, but the larger core chips have a ring bus for one set of cores, and another ring bus for the second.
But generally for most workloads the NUMA issues related to the newer chips isn't a particularly large hurdle from getting good performance. What I am concerned about though is how good the Epyc floating point is, I fear they are bragging about integer performance and not FP because they are behind on FP.
[+] [-] Dylan16807|8 years ago|reply
But there's no avoiding some kind of complex inter-core dynamics at this level. The Intel alternative is a bunch of ring busses that have different speeds to each core from any point. And this design makes every memory access go over the infinity fabric, so latency might be surprisingly even.
[+] [-] dom0|8 years ago|reply
[+] [-] loeg|8 years ago|reply
[+] [-] smilekzs|8 years ago|reply
[1]: https://community.amd.com/thread/215773
[+] [-] examancer|8 years ago|reply
The number of people affected are low. My Ryzen machine has only ever run linux and compiles a lot and has never exhibited this behavior. Also, most new platforms have issues, even new server platforms. These will be worked through during substantial validation server OEMs will go through.
Lastly, look up the errata list for any Xeon CPUs. Intel releases microcode updates for them several times a year to fix bugs. Modern CPUs are complex and will pretty much always have bugs. Luckily some combination of BIOS or microcode updates will almost always resolve them.
[+] [-] my123|8 years ago|reply
[+] [-] mschuster91|8 years ago|reply
What a monster. Pack this together with a couple Quadro GPU accelerators and you got some serious allround performance.
[+] [-] __jal|8 years ago|reply
For the more boring among us (like me), swap the GPUs for 24-ish NVMe SSDs and a few 10G cards, and that is one hell of a DB server...
[+] [-] tracker1|8 years ago|reply
Nice to see AMD competing again, knew they would get some ground in the server space looking at the Zen benchmarks on the consumer CPUs.
[+] [-] TazeTSchnitzel|8 years ago|reply
[+] [-] examancer|8 years ago|reply
Of course more performance per watt is always better and I too hope Zen 2 can be even better. But right now, in certain scenarios, Ryzen is already leading in both performance per watt and per dollar.
[+] [-] ksec|8 years ago|reply
[+] [-] tonyplee|8 years ago|reply
What are some of the technical limit to if AMD to 2, 4, 8x this approach?
Only power/heat? IO should not be hard since pins on MCM should be able to scale out easily, right?
[+] [-] sp332|8 years ago|reply
[+] [-] TazeTSchnitzel|8 years ago|reply
[+] [-] jabl|8 years ago|reply
Secondly, there's probably some economic argument as well. Too few customers willing to pay for a humongous MCM, and with the attendant wiring complexity requiring more layers for the motherboard, it might be cheaper to go to more sockets instead?
[+] [-] GordonS|8 years ago|reply
[+] [-] std_throwaway|8 years ago|reply
Also if you factor in a performance critical software for 100000$ that runs on that single node, you will buy the fastest hardware that you can get for a few thousand dollars more.
[+] [-] mrb|8 years ago|reply
But Intel does the same. The 2-socket http://ark.intel.com/products/91768/Intel-Xeon-Processor-E5-... is superior in every single aspect to the 4-socket https://ark.intel.com/products/93796/Intel-Xeon-Processor-E5... but yet the 4-socket CPU is priced 1.6× higher...
[+] [-] TazeTSchnitzel|8 years ago|reply
[+] [-] ajaimk|8 years ago|reply
[+] [-] gigatexal|8 years ago|reply
[+] [-] shaklee3|8 years ago|reply
[+] [-] ksec|8 years ago|reply
For AMD there is no PCIe 4.0 chip in 2018. And PCie 5.0 is already out in 2019.
[+] [-] dom0|8 years ago|reply
No
> Since changes of DDR requires a changes of Socket.
Not necessarily, but usually done so.
[+] [-] kodfodrasz|8 years ago|reply
[+] [-] newman314|8 years ago|reply
[+] [-] redtuesday|8 years ago|reply
[0] http://products.amd.com/de-de
[+] [-] scopecreep|8 years ago|reply
[deleted]
[+] [-] redial|8 years ago|reply
[+] [-] lawrenceyan|8 years ago|reply
[+] [-] djsumdog|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]