Intel vulnerabilities costing 25% CPU performance loss to a cloud provider

[+] morrbo|6 years ago|reply

Does anyone actually have some benchmarks of the latest gen AMD vs the latest gen intel processors with all mitigations for spectre, meltdown, and the 10 other sidechannel/speculative execution vulnerabilities applied?

I'd genuinely be curious to find out what the eventual results are because as i understand it AMD is not too far away from Intel as a standalone processor, surely in a "real world" scenario they'd be significantly faster?

[+] dhx|6 years ago|reply

For Linux 5.0 and as of March this year, the performance impact of enabling Linux kernel mitigations for Spectre/Meltdown against various CPUs with the latest microcode (as of March) are:

Intel: -13% for the Core i9 7980XE, -17% for the 8086K.

AMD: -3% for the 2700X.

Reference: https://www.phoronix.com/scan.php?page=article&item=linux50-...

Phoronix is due to release new benchmarks tomorrow showing full impact from Spectre/Meltdown/L1TF/MDS. There are some initial benchmarks at https://www.phoronix.com/scan.php?page=news_item&px=MDS-Zomb...

[+] Someone1234|6 years ago|reply

It is a moving target. A lot of "day 1" patches have absolutely tanked performance only to regain performance later via smarter mitigations. Both Linux and Windows as well as microcode updates have all seen some of the performance loss regained.

And while it would be nice to call any point "the end" and measure then, as of two weeks ago they were still finding additional vulnerabilities and patching them. So if there is an end we aren't there yet.

That all being said, I am also curious on what the results would show. Just very hard to pin down.

[+] noir_lord|6 years ago|reply

On multithreaded stuff AMD is killing it, they where slightly behind on IPC about 5% but the clocks are typically a bit lower.

I'd expect with both machines patched that clock for clock AMD are slightly ahead.

I've got a 2700X at home and it's a monster and I'd Zen2 is on the conservative end of the leaks/rumours it's going to be a total killer.

[+] shereadsthenews|6 years ago|reply

On realistic workload (branchy C++ server code, not SPEC and the other crap that Phoronuts like to report) the AMD CPUs are 50% slower ... Intel could pile on the mitigations and still be way ahead. The reason these mitigations cost Intel so much is because they have these speculation features and AMD just didn't ever have them. At least with the Intel parts you can decide for yourself whether to enable or disable them.

[+] samber|6 years ago|reply

@waxzce: "FYI, as cloud provider we rawly loss around 25% of CPU performances the lasts 18 months due to different CVE and issues on CPU and mitigation limiting capacity using microcode, so we stuff more CPUs, but prices didn't go down at all... That's a kind of upselling. #IntelFail"

[+] aiCeivi9|6 years ago|reply

Is it #IntelFail if Intel had supply issues in last year? It doesn't look like demand was affected by much.

[+] jeltz|6 years ago|reply

> If AMD was in a better shape, there is a real market momentup here.

What does this comment refer to? As far as I know AMD is in pretty good shape.

[+] LinuxBender|6 years ago|reply

We are doing perf testing of the AMD versions of HP/Dell servers. We may tech refresh about 40k servers with epyc-2 if performance is better than Intel. It's about time for some new servers anyway.

[+] tgtweak|6 years ago|reply

It seems that they're on track to get epyc into OEMs and cloud providers in quantity. They've been successfully growing the ecosystem in the server space over the last 2 years (Supermicro, dell, HPE, Lenovo all pushing epyc sku's now) so they're very well poised to capitalize on it if they can keep up with supply. The power efficiency gains on 7nm will be pretty tempting to cloud providers and colocation clients alike. If you can put 256 cores (4 epyc2 sockets) into a 1U server and keep it under 1200W that's going to be a landslide win for density and running costs. Factor in pcie 4.0 and things are looking pretty sexy.

I've read that wafer yeilds on zen2 are 70%+ which gives a huge advantage on cost and production efficiency to AMD. I think Intel's skylake yield for an equivalent 28 core die is sub-40%. If anything the limiting factor is going to be TSMCs ability to give AMD capacity on it's very crowded and in-demand 7nm fab.

[+] exabrial|6 years ago|reply

What are the possible pitfalls of _not_ runnning spectre et all mitigations, if you're _not_ hosting other people's code, but on your own hardware?

[+] chmod775|6 years ago|reply

Just guessing here, but suppose an attacker is able to get unprivileged code to run on your machine (by taking over some process). He is now able to extract secrets from other processes on the machine that he ordinarily wouldn't have access to. Think SSL keys etc.

But I believe you're right in that those exploits are not as dangerous in a non-shared hosting scenario.

[+] Shivetya|6 years ago|reply

According to Apple[1] Full mitigation requires using the Terminal app to enable an additional CPU instruction and disable hyper-threading processing technology.

So, if you have a system which you cannot expose it is a pretty big damper on performance.

[1] https://support.apple.com/en-us/HT210107

[+] etaioinshrdlu|6 years ago|reply

And you also benefit from herd immunity. If 99.9% of people have mitigations, then the likelihood of your difficult timing attack working is quite low... so why bother trying?

[+] userbinator|6 years ago|reply

None, really.

If you're not sharing your hardware with other code you don't trust, it's not a problem.

[+] Filligree|6 years ago|reply

Not much. We've yet to see any live Spectre exploits (that I know of), and I'd be surprised if the first one is a remote exploit.

[+] idrae|6 years ago|reply

Not an expert, but if you are running absolutely no untrusted code (which is hard depending on your definition of untrusted), then the risks are low.

However, when a vulnerability is found, then spectre etc. make it easier to abuse that vulnerability to do something useful.

[+] walrus01|6 years ago|reply

Don't get married to Intel, if you need big beefy xen or kvm hypervisor machines, there's lots of good EPYC based motherboards.

[+] kierank|6 years ago|reply

This is an argument for organisations to go back to being on-prem, running their non-public facing workloads privately.

[+] nixgeek|6 years ago|reply

How's that an improvement over using 'Dedicated Instances' or the 'Bare Metal' offerings which exist with cloud providers, where you're not colocated with other entities but still enjoy all the benefits of how the infrastructure is managed, flexibility in upscale/downscale (instead of acquiring capital assets which depreciate over 36-48 months).

[+] botto|6 years ago|reply

Is this a chance for ARM providers to move in and outcompete Intel, especially if they can provide similar tech but with security? Basically an ARMs race?

[+] AnIdiotOnTheNet|6 years ago|reply

You can't stop this class of vulnerabilities without sacrificing performance because the reason for the performance is also the reason for the vulnerabilities. That's the choice you have, and no brand affiliation will save you.

[+] dfrage|6 years ago|reply

ARM is more less examined than less vulnerable. Most of their cores aren't speculative execution out-of-order, that's expensive in power and silicon real estate/cost, but they have 13 vulnerable to Spectre, including 1 known for the 2018 Rogue System Register Read which like this latest set for Intel is design specific. It would be wise to assume there are more vulnerabilities hiding, the name Spectre was chosen because we'll be haunted by it for a long time. Also one of their newest designs is vulnerable to Meltdown, see the whole list here: https://developer.arm.com/support/arm-security-updates/specu...

[+] davrosthedalek|6 years ago|reply

If they want to compete with intel and AMD on performance, they will have to include the same speculative out-of-order architecture. These bugs are actually not bugs in the sense that the hardware does something it shouldn't do. It's 100% to spec. The problem is the spec. AMD has less problems, but I'm not so sure this is only because they are more secure. I think there are more eyes looking for exploitts for intel CPUs than for AMD.

[+] cyphar|6 years ago|reply

I think it's more likely that AMD will pick up the slack.

[+] imhoguy|6 years ago|reply

Now how much this 25% is in terms of CO2 emissions?

[+] userbinator|6 years ago|reply

It's a little funny to see the "use AMD!" comments --- since from what I understand, Intel's optimisations that lead to these side-channels are specifically for performance, so using AMD instead of Intel might mean the same amount performance loss.

[+] Symmetry|6 years ago|reply

Not at all. AMD does a hardware check for memory accesses in serial with the reading the data rather than in parrallel. This probably adds an FO4 or so of latency to the critical path of a read which could mean a lot of things from a design standpoint. Maybe you reduce your L1 cache size or associativity or such to make back that Fo4. Maybe you just say "All our stages are going to be 15 FO4s instead of 14" and you clock 5% slower but your pipeline gets shorter and you have more room for cleverness in other places. I'd be very surprised if the net performance impact was more than 1%.

By contrast having to deal with the issue in software or worse, by turning off SMT, is much more damaging and gets you the 10s of percents of performance impact that people are talking about.

[+] PedroBatista|6 years ago|reply

Current AMD offerings are generally beating Intel, and in the price-to-performance game it's a complete slaughter.

[+] Kye|6 years ago|reply

Alternate perspective: AMD not doing these risky optimizations reflects a better engineering culture, or at least less risky priorities.

Go with Intel and risk losing a huge chunk of your performance some day, or go with AMD and know what to expect. Different people and companies have different risk tolerances, so it's not an obvious choice one way or the other.

[+] snvzz|6 years ago|reply

Might have been the case during AMD bulldozer. There, they were still doing well by surviving with a massive fab disadvantage.

From Zen thereon, they've been more or less on par in single threaded performance, and have the performance edge on multicore performance, while being cheaper and unaffected by most of these new CPU vulnerabilities, on top of that.

[+] scotty79|6 years ago|reply

Amd was walways better wit bang for buck. Intel just had the fastest CPUs for rich who nwed best single core performance. If Intel doesn't have the fastest CPUs then AMD is just hands down cheaper and Intel has no advantage.

[+] mda|6 years ago|reply

They have completely different architectures, some optimizations can be done in safer ways.

[+] mfatica|6 years ago|reply

AMD is in a better position here

[+] adrianN|6 years ago|reply

Why is nobody suing Intel? They sold defective chips.

[+] gnode|6 years ago|reply

This isn't a defect in the traditional sense. The device is behaving to specifications, but a new method of attacking it has been found.

Rowhammer is arguably more aptly described as a defect; in that case the RAM is not behaving to spec.

[+] deweller|6 years ago|reply

Is there an End User License Agreement that users agree to before using these chips? If it is anything like software EULA's, then Intel is likely protected from lawsuits for defects.

[+] orliesaurus|6 years ago|reply

Not voluntarily?

[+] qaq|6 years ago|reply

Intel vulnerabilities are providing additional 25% revenue to Cloud Providers.

[+] kasey_junk|6 years ago|reply

Most cloud providers are trying to move up the value chain & provide higher margin differentiated services (managed db, queues, etc) instead of staying at the race to the bottom vm market.

These intel mitigations impact those services just like everyone else.

[+] IfOnlyYouKnew|6 years ago|reply

In the immediate short term: maybe. But fundamentally, these vulnerabilities are (hyper)threatening their business model, because not sharing the infrastructure suddenly becomes far more attractive.

So to the extend that you're suggesting that cloud providers are happy, or may even have had a hand in this: nahh.

[+] bufferoverflow|6 years ago|reply

To Intel maybe. It's a loss to the cloud providers and their customers.

[+] Aissen|6 years ago|reply

Not really: https://twitter.com/waxzce/status/1129381076160454657

> The performance loss is not visible for our customers, we have to manage the loss in ourselves. It’s a kind of hidden defect.

[+] qes|6 years ago|reply

I run a high traffic web service in the cloud. Well over a billion requests served per month from the origin. We run our CPU's around ~60%.

We've seen maybe a 2% hit from spectre & meltdown mitigations. It's hard to even tell because it's a small enough amount that it tends to get lost in the noise.

[+] ithkuil|6 years ago|reply

other replies mentioned that the cost of the slowdown is absorbed by the cloud provider; let's just pretend it's fully passed down to the user and only nitpick at the math behind the joke:

Imagine a slowdown (performance penalty) of 50%. It would double the time to complete a task, thus doubling the cost (expressed in cost increase that would be a 100% increase!).

Generic formula: 1÷(1−x)−1

For x = 25% ---> 1÷(1−0.25)−1 = 33% duration (cost) increase

[+] sureaboutthis|6 years ago|reply

Perhaps this is the reason for Intel's recent push on their self created version of Clear Linux?

[+] yjftsjthsd-h|6 years ago|reply

How does that help? You still take the perf hit?

[+] mtgx|6 years ago|reply

I bet Google isn't so giddy now about being FIRST!! [1] with Skylake in the data center a couple of years ago, or "going on all-in" with Intel on the Chromebooks (it didn't even give AMD a chance until very recently...), despite Chrome OS being one of the very few operating systems that are truly architecture agnostic.

Now it's paying dearly for that mistake, with up to 40% performance loss on Chromebooks due to the disabling of HT:

https://www.techrepublic.com/article/mds-vulnerabilities-lea...

Google broke one of the most basic business rules: never rely on a single supplier. You're always worse off in the end, even if the exclusivity deals seem very tempting in the short-term.

[1] https://cloud.google.com/blog/products/gcp/compute-engine-up...

[+] yjftsjthsd-h|6 years ago|reply

> "going on all-in" with Intel on the Chromebooks

Weren't Chromebooks on ARM before x86?

> Google broke one of the most basic business rules: never rely on a single supplier. You're always worse off in the end, even if the exclusivity deals seem very tempting in the short-term.

I don't think this is universally true, although I agree that it's probably prudent. Diverse options/suppliers are a risk mitigation, but they do have a cost.

178 comments