Does anyone actually have some benchmarks of the latest gen AMD vs the latest gen intel processors with all mitigations for spectre, meltdown, and the 10 other sidechannel/speculative execution vulnerabilities applied?
I'd genuinely be curious to find out what the eventual results are because as i understand it AMD is not too far away from Intel as a standalone processor, surely in a "real world" scenario they'd be significantly faster?
For Linux 5.0 and as of March this year, the performance impact of enabling Linux kernel mitigations for Spectre/Meltdown against various CPUs with the latest microcode (as of March) are:
Intel: -13% for the Core i9 7980XE, -17% for the 8086K.
It is a moving target. A lot of "day 1" patches have absolutely tanked performance only to regain performance later via smarter mitigations. Both Linux and Windows as well as microcode updates have all seen some of the performance loss regained.
And while it would be nice to call any point "the end" and measure then, as of two weeks ago they were still finding additional vulnerabilities and patching them. So if there is an end we aren't there yet.
That all being said, I am also curious on what the results would show. Just very hard to pin down.
On realistic workload (branchy C++ server code, not SPEC and the other crap that Phoronuts like to report) the AMD CPUs are 50% slower ... Intel could pile on the mitigations and still be way ahead. The reason these mitigations cost Intel so much is because they have these speculation features and AMD just didn't ever have them. At least with the Intel parts you can decide for yourself whether to enable or disable them.
@waxzce: "FYI, as cloud provider we rawly loss around 25% of CPU performances the lasts 18 months due to different CVE and issues on CPU and mitigation limiting capacity using microcode, so we stuff more CPUs, but prices didn't go down at all... That's a kind of upselling. #IntelFail"
We are doing perf testing of the AMD versions of HP/Dell servers. We may tech refresh about 40k servers with epyc-2 if performance is better than Intel. It's about time for some new servers anyway.
It seems that they're on track to get epyc into OEMs and cloud providers in quantity. They've been successfully growing the ecosystem in the server space over the last 2 years (Supermicro, dell, HPE, Lenovo all pushing epyc sku's now) so they're very well poised to capitalize on it if they can keep up with supply. The power efficiency gains on 7nm will be pretty tempting to cloud providers and colocation clients alike. If you can put 256 cores (4 epyc2 sockets) into a 1U server and keep it under 1200W that's going to be a landslide win for density and running costs. Factor in pcie 4.0 and things are looking pretty sexy.
I've read that wafer yeilds on zen2 are 70%+ which gives a huge advantage on cost and production efficiency to AMD. I think Intel's skylake yield for an equivalent 28 core die is sub-40%. If anything the limiting factor is going to be TSMCs ability to give AMD capacity on it's very crowded and in-demand 7nm fab.
Just guessing here, but suppose an attacker is able to get unprivileged code to run on your machine (by taking over some process). He is now able to extract secrets from other processes on the machine that he ordinarily wouldn't have access to. Think SSL keys etc.
But I believe you're right in that those exploits are not as dangerous in a non-shared hosting scenario.
According to Apple[1] Full mitigation requires using the Terminal app to enable an additional CPU instruction and disable hyper-threading processing technology.
So, if you have a system which you cannot expose it is a pretty big damper on performance.
And you also benefit from herd immunity. If 99.9% of people have mitigations, then the likelihood of your difficult timing attack working is quite low... so why bother trying?
How's that an improvement over using 'Dedicated Instances' or the 'Bare Metal' offerings which exist with cloud providers, where you're not colocated with other entities but still enjoy all the benefits of how the infrastructure is managed, flexibility in upscale/downscale (instead of acquiring capital assets which depreciate over 36-48 months).
Is this a chance for ARM providers to move in and outcompete Intel, especially if they can provide similar tech but with security?
Basically an ARMs race?
You can't stop this class of vulnerabilities without sacrificing performance because the reason for the performance is also the reason for the vulnerabilities. That's the choice you have, and no brand affiliation will save you.
ARM is more less examined than less vulnerable. Most of their cores aren't speculative execution out-of-order, that's expensive in power and silicon real estate/cost, but they have 13 vulnerable to Spectre, including 1 known for the 2018 Rogue System Register Read which like this latest set for Intel is design specific. It would be wise to assume there are more vulnerabilities hiding, the name Spectre was chosen because we'll be haunted by it for a long time. Also one of their newest designs is vulnerable to Meltdown, see the whole list here: https://developer.arm.com/support/arm-security-updates/specu...
If they want to compete with intel and AMD on performance, they will have to include the same speculative out-of-order architecture. These bugs are actually not bugs in the sense that the hardware does something it shouldn't do. It's 100% to spec. The problem is the spec.
AMD has less problems, but I'm not so sure this is only because they are more secure. I think there are more eyes looking for exploitts for intel CPUs than for AMD.
It's a little funny to see the "use AMD!" comments --- since from what I understand, Intel's optimisations that lead to these side-channels are specifically for performance, so using AMD instead of Intel might mean the same amount performance loss.
Not at all. AMD does a hardware check for memory accesses in serial with the reading the data rather than in parrallel. This probably adds an FO4 or so of latency to the critical path of a read which could mean a lot of things from a design standpoint. Maybe you reduce your L1 cache size or associativity or such to make back that Fo4. Maybe you just say "All our stages are going to be 15 FO4s instead of 14" and you clock 5% slower but your pipeline gets shorter and you have more room for cleverness in other places. I'd be very surprised if the net performance impact was more than 1%.
By contrast having to deal with the issue in software or worse, by turning off SMT, is much more damaging and gets you the 10s of percents of performance impact that people are talking about.
Alternate perspective: AMD not doing these risky optimizations reflects a better engineering culture, or at least less risky priorities.
Go with Intel and risk losing a huge chunk of your performance some day, or go with AMD and know what to expect. Different people and companies have different risk tolerances, so it's not an obvious choice one way or the other.
Might have been the case during AMD bulldozer. There, they were still doing well by surviving with a massive fab disadvantage.
From Zen thereon, they've been more or less on par in single threaded performance, and have the performance edge on multicore performance, while being cheaper and unaffected by most of these new CPU vulnerabilities, on top of that.
Amd was walways better wit bang for buck. Intel just had the fastest CPUs for rich who nwed best single core performance. If Intel doesn't have the fastest CPUs then AMD is just hands down cheaper and Intel has no advantage.
Is there an End User License Agreement that users agree to before using these chips? If it is anything like software EULA's, then Intel is likely protected from lawsuits for defects.
Most cloud providers are trying to move up the value chain & provide higher margin differentiated services (managed db, queues, etc) instead of staying at the race to the bottom vm market.
These intel mitigations impact those services just like everyone else.
In the immediate short term: maybe. But fundamentally, these vulnerabilities are (hyper)threatening their business model, because not sharing the infrastructure suddenly becomes far more attractive.
So to the extend that you're suggesting that cloud providers are happy, or may even have had a hand in this: nahh.
I run a high traffic web service in the cloud. Well over a billion requests served per month from the origin. We run our CPU's around ~60%.
We've seen maybe a 2% hit from spectre & meltdown mitigations. It's hard to even tell because it's a small enough amount that it tends to get lost in the noise.
other replies mentioned that the cost of the slowdown is absorbed by the cloud provider; let's just pretend it's fully passed down to the user and only nitpick at the math behind the joke:
Imagine a slowdown (performance penalty) of 50%. It would double the time to complete a task, thus doubling the cost (expressed in cost increase that would be a 100% increase!).
Generic formula: 1÷(1−x)−1
For x = 25% ---> 1÷(1−0.25)−1 = 33% duration (cost) increase
I bet Google isn't so giddy now about being FIRST!! [1] with Skylake in the data center a couple of years ago, or "going on all-in" with Intel on the Chromebooks (it didn't even give AMD a chance until very recently...), despite Chrome OS being one of the very few operating systems that are truly architecture agnostic.
Now it's paying dearly for that mistake, with up to 40% performance loss on Chromebooks due to the disabling of HT:
Google broke one of the most basic business rules: never rely on a single supplier. You're always worse off in the end, even if the exclusivity deals seem very tempting in the short-term.
> Google broke one of the most basic business rules: never rely on a single supplier. You're always worse off in the end, even if the exclusivity deals seem very tempting in the short-term.
I don't think this is universally true, although I agree that it's probably prudent. Diverse options/suppliers are a risk mitigation, but they do have a cost.
[+] [-] morrbo|6 years ago|reply
I'd genuinely be curious to find out what the eventual results are because as i understand it AMD is not too far away from Intel as a standalone processor, surely in a "real world" scenario they'd be significantly faster?
[+] [-] dhx|6 years ago|reply
Intel: -13% for the Core i9 7980XE, -17% for the 8086K.
AMD: -3% for the 2700X.
Reference: https://www.phoronix.com/scan.php?page=article&item=linux50-...
Phoronix is due to release new benchmarks tomorrow showing full impact from Spectre/Meltdown/L1TF/MDS. There are some initial benchmarks at https://www.phoronix.com/scan.php?page=news_item&px=MDS-Zomb...
[+] [-] Someone1234|6 years ago|reply
And while it would be nice to call any point "the end" and measure then, as of two weeks ago they were still finding additional vulnerabilities and patching them. So if there is an end we aren't there yet.
That all being said, I am also curious on what the results would show. Just very hard to pin down.
[+] [-] noir_lord|6 years ago|reply
I'd expect with both machines patched that clock for clock AMD are slightly ahead.
I've got a 2700X at home and it's a monster and I'd Zen2 is on the conservative end of the leaks/rumours it's going to be a total killer.
[+] [-] shereadsthenews|6 years ago|reply
[+] [-] samber|6 years ago|reply
[+] [-] aiCeivi9|6 years ago|reply
[+] [-] jeltz|6 years ago|reply
What does this comment refer to? As far as I know AMD is in pretty good shape.
[+] [-] LinuxBender|6 years ago|reply
[+] [-] tgtweak|6 years ago|reply
I've read that wafer yeilds on zen2 are 70%+ which gives a huge advantage on cost and production efficiency to AMD. I think Intel's skylake yield for an equivalent 28 core die is sub-40%. If anything the limiting factor is going to be TSMCs ability to give AMD capacity on it's very crowded and in-demand 7nm fab.
[+] [-] exabrial|6 years ago|reply
[+] [-] chmod775|6 years ago|reply
But I believe you're right in that those exploits are not as dangerous in a non-shared hosting scenario.
[+] [-] Shivetya|6 years ago|reply
So, if you have a system which you cannot expose it is a pretty big damper on performance.
[1] https://support.apple.com/en-us/HT210107
[+] [-] etaioinshrdlu|6 years ago|reply
[+] [-] userbinator|6 years ago|reply
If you're not sharing your hardware with other code you don't trust, it's not a problem.
[+] [-] Filligree|6 years ago|reply
[+] [-] idrae|6 years ago|reply
However, when a vulnerability is found, then spectre etc. make it easier to abuse that vulnerability to do something useful.
[+] [-] walrus01|6 years ago|reply
[+] [-] kierank|6 years ago|reply
[+] [-] nixgeek|6 years ago|reply
[+] [-] botto|6 years ago|reply
[+] [-] AnIdiotOnTheNet|6 years ago|reply
[+] [-] dfrage|6 years ago|reply
[+] [-] davrosthedalek|6 years ago|reply
[+] [-] cyphar|6 years ago|reply
[+] [-] imhoguy|6 years ago|reply
[+] [-] userbinator|6 years ago|reply
[+] [-] Symmetry|6 years ago|reply
By contrast having to deal with the issue in software or worse, by turning off SMT, is much more damaging and gets you the 10s of percents of performance impact that people are talking about.
[+] [-] PedroBatista|6 years ago|reply
[+] [-] Kye|6 years ago|reply
Go with Intel and risk losing a huge chunk of your performance some day, or go with AMD and know what to expect. Different people and companies have different risk tolerances, so it's not an obvious choice one way or the other.
[+] [-] snvzz|6 years ago|reply
From Zen thereon, they've been more or less on par in single threaded performance, and have the performance edge on multicore performance, while being cheaper and unaffected by most of these new CPU vulnerabilities, on top of that.
[+] [-] scotty79|6 years ago|reply
[+] [-] mda|6 years ago|reply
[+] [-] mfatica|6 years ago|reply
[+] [-] adrianN|6 years ago|reply
[+] [-] gnode|6 years ago|reply
Rowhammer is arguably more aptly described as a defect; in that case the RAM is not behaving to spec.
[+] [-] deweller|6 years ago|reply
[+] [-] orliesaurus|6 years ago|reply
[+] [-] qaq|6 years ago|reply
[+] [-] kasey_junk|6 years ago|reply
These intel mitigations impact those services just like everyone else.
[+] [-] IfOnlyYouKnew|6 years ago|reply
So to the extend that you're suggesting that cloud providers are happy, or may even have had a hand in this: nahh.
[+] [-] bufferoverflow|6 years ago|reply
[+] [-] Aissen|6 years ago|reply
> The performance loss is not visible for our customers, we have to manage the loss in ourselves. It’s a kind of hidden defect.
[+] [-] qes|6 years ago|reply
We've seen maybe a 2% hit from spectre & meltdown mitigations. It's hard to even tell because it's a small enough amount that it tends to get lost in the noise.
[+] [-] ithkuil|6 years ago|reply
Imagine a slowdown (performance penalty) of 50%. It would double the time to complete a task, thus doubling the cost (expressed in cost increase that would be a 100% increase!).
Generic formula: 1÷(1−x)−1
For x = 25% ---> 1÷(1−0.25)−1 = 33% duration (cost) increase
[+] [-] sureaboutthis|6 years ago|reply
[+] [-] yjftsjthsd-h|6 years ago|reply
[+] [-] mtgx|6 years ago|reply
Now it's paying dearly for that mistake, with up to 40% performance loss on Chromebooks due to the disabling of HT:
https://www.techrepublic.com/article/mds-vulnerabilities-lea...
Google broke one of the most basic business rules: never rely on a single supplier. You're always worse off in the end, even if the exclusivity deals seem very tempting in the short-term.
[1] https://cloud.google.com/blog/products/gcp/compute-engine-up...
[+] [-] yjftsjthsd-h|6 years ago|reply
Weren't Chromebooks on ARM before x86?
> Google broke one of the most basic business rules: never rely on a single supplier. You're always worse off in the end, even if the exclusivity deals seem very tempting in the short-term.
I don't think this is universally true, although I agree that it's probably prudent. Diverse options/suppliers are a risk mitigation, but they do have a cost.