AMD claims Arm ISA doesn't offer efficiency advantage over x86

[+] exmadscientist|6 months ago|reply

This is an entirely uncontroversial take among experts in the space. x86 is an old CISC-y hot mess. RISC-V is a new-school hyper-academic hot mess. Recent ARM is actually pretty good. And none of it matters, because the uncore and the fabrication details (in particular, whether things have been tuned to run full speed demon or full power sipper) completely dominate the ISA.

In the past x86 didn't dominate in low power because Intel had the resources to care but never did, and AMD never had the resources to try. Other companies stepped in to full that niche, and had to use other ISAs. (If they could have used x86 legally, they might well have done so. Oops?) That may well be changing. Or perhaps AMD will let x86 fade away.

[+] torginus|6 months ago|reply

I remember reading this Jim Keller interview:

https://web.archive.org/web/20210622080634/https://www.anand...

Basically the gist of it is that the difference between ARM/x86 mostly boils down to instruction decode, and:

- Most instructions end up being simple load/store/conditional branch etc. on both architectures, where there's literally no difference in encoding efficiency

- Variable length instruction has pretty much been figured out on x86 that it's no longer a bottleneck

Also my personal addendum is that today's Intel efficiency cores are have more transistors and better perf than the big Intel cores of a decade ago

[+] mort96|6 months ago|reply

This matches my understanding as well, as someone who has a great deal of interest in the field but never worked in it professionally. CPUs all have a microarchitecture that doesn't look like the ISA at all, and they have an instruction decoder that translates ISA one or more ISA instructions into zero or more microarchitectural instructions. There are some advantages to having a more regular ISA, such as the ability to more easily decode multiple instructions in parallel if they're all the same size or having to spend fewer transistors on the instruction decoder, but for the big superscalar chips we all have in our desktops and laptops and phones, the drawbacks are tiny.

I imagine that the difference is much greater for the tiny in-order CPUs we find in MCUs though, just because an amd64 decoder would be a comparatively much larger fraction of the transistor budget

[+] newpavlov|6 months ago|reply

>RISC-V is a new-school hyper-academic hot mess.

Yeah... Previously I was a big fan of RISC-V, but after I had to dig slightly deeper into it as a software developer my enthusiasm for it has cooled down significantly.

It's still great that we got a mainstream open ISA, but now I view it as a Linux of the hardware world, i.e. a great achievement, with a big number of questionable choices baked in, which unfortunately stifles other open alternatives by the virtue of being "good enough".

[+] whynotminot|6 months ago|reply

An annoying thing people have done since Apple Silicon is claim that its advantages were due to Arm.

No, not really. The advantage is Apple prioritizing efficiency, something Intel never cared enough about.

[+] Ianjit|6 months ago|reply

This paper is great:

"Our methodical investigation demonstrates the role of ISA in modern microprocessors’ performance and energy efficiency. We find that ARM, MIPS, and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant."

https://dl.acm.org/doi/10.1145/2699682 https://abdullahyildiz.github.io/files/isa_wars.pdf

[+] tliltocatl|6 months ago|reply

Nitpick: uncore and the fabrication details dominate the ISA on high end/superscalar architectures (because modern superscalar basically abstract the ISA away at the frontend). On smaller (i. e. MCU) cores x86 will never stand any chance.

[+] jabl|6 months ago|reply

> Or perhaps AMD will let x86 fade away.

I agree with what you write otherwise, but not this. Why would AMD "let x86 fade away"? They are one of the two oligolistic CPU providers of the x86 ecosystem which is worth zillions. Why should they throw that away in order to become yet another provider of ARM (or RISC-V or whatnot) CPUs? I think that as long as the x86 market remains healthy, and AMD is in a position to compete in that market, they will continue doing so.

[+] somanyphotons|6 months ago|reply

I'd love to see what would happen if AMD put out a chip with the instruction decoders swapped out for risc-v instruction decoders

[+] wlesieutre|6 months ago|reply

VIA used to make low power x86 processors

[+] epolanski|6 months ago|reply

I have a hard time believing this fully: more custom instructions, more custom hardware, more heat.

How can you avoid it?

[+] mananaysiempre|6 months ago|reply

> x86 didn't dominate in low power because Intel had the resources to care but never did

Remember Atom tablets (and how they sucked)?

[+] pezezin|6 months ago|reply

After playing around with some ARM hardware I have to say that I don't care whether ARM is more efficient or not as long as the boot process remains the clusterfuck that it is today.

IMHO the major win of the IBM PC platform is that it standardized the boot process from the very beginning, first with the BIOS and later with UEFI, so you can grab any random ISO for any random OS and it will work. Meanwhile in the ARM world it seems that every single CPU board requires its own drivers, device tree, and custom OS build. RISC-V seems to suffer from the same problem, and until this problem is solved, I will avoid them like toxic waste.

[+] Teknoman117|6 months ago|reply

ARM systems that support UEFI are pretty fun to work with. Then there's everything else. Anytime I hear the phrase "vendor kernel" I know I'm in for an experience...

[+] Joel_Mckay|6 months ago|reply

In general, most modern ARM 8/9 64bit SoC purged a lot of the vestigial problems.

Yet most pre-compiled package builds still never enable the advanced ASIC features for compatibility and safety-concerns. AMD comparing the NERF'd ARM core features is pretty sleazy PR.

Tegra could be a budget Apple M3 Pro, but those folks chose imaginary "AI" money over awesomeness. =3

[+] markfeathers|6 months ago|reply

Check out ARM SBSA / SBBR which seems aimed at solving most of these issues. https://en.wikipedia.org/wiki/Server_Base_System_Architectur... I'm hopeful RISCV comes up with something similar.

[+] freedomben|6 months ago|reply

I could not agree more. I wanted to love ARM, but after playing around with numerous different pieces of hardware, I won't touch it with a ten-foot pole anymore. The power savings is not worth the pain to me.

I hope like hell that RISC-V doesn't end up in the same boot-process toxic wasteland

[+] txrx0000|6 months ago|reply

It's not the ISA. Modern Macbooks are power-efficient because they have:

- RAM on package

- PMIC power delivery

- Better power management by OS

Geekerwan investigated this a while ago, see:

https://www.youtube.com/watch?v=Z0tNtMwYrGA https://www.youtube.com/watch?v=b3FTtvPcc2s https://www.youtube.com/watch?v=ymoiWv9BF7Q

Intel and AMD have implemented these improvements with Lunar Lake and Strix Halo. You can buy an x86 laptop with Macbook-like efficiency right now if you know which SoCs to pick.

edit: Correction. I looked at the die image of Strix Halo and thought it looked like it had on-package RAM. It does not. It doesn't use PMIC either. Lunar Lake is the only Apple M-series competitor on x86 at the moment.

[+] aurareturn|6 months ago|reply

  Intel and AMD have implemented these improvements with Lunar Lake and Strix Halo. You can buy an x86 laptop with Macbook-like efficiency right now if you know which SoCs to pick.

M4 is about 3.6x more efficient than Strix Halo when under load.[0] On a daily basis, this difference can be more because Apple Silicon has true big.Little cores that send low priority tasks to the highly efficient small cores.

For Lunar Lake, base M4 is about 35% faster, 2x more efficient, and actually has a bigger die size than M4.[1] Intel is discontinuing the Lunar Lake line because it isn't profitable for them.

I'm not sure how you can claim "Mac-like efficiency".

[0]https://imgur.com/a/yvpEpKF

[1]https://www.notebookcheck.net/Intel-Lunar-Lake-CPU-analysis-...

[+] jlei523|6 months ago|reply

  Intel and AMD have implemented these improvements with Lunar Lake and Strix Halo. You can buy an x86 laptop with Macbook-like efficiency right now if you know which SoCs to pick.

This just isn't true. Yes, Lunar Lake has great idle performance. But if you need to actually use the CPU, it's drastically slower than M4 while consuming more power.

Strix Halo battery life and efficiency is not even in the same ball park.

[+] hakube|6 months ago|reply

Windows laptops performance while on battery is terrible especially when you put it on power save mode. Macbooks on the other hand doesn't have that problem. It's just like you're using an iPad.

[+] mrheosuper|6 months ago|reply

By PMIC, did you mean VRM ?, if not, can you tell me the difference between them ?

[+] SOTGO|6 months ago|reply

I'd be interested to hear someone with more experience talk about this or if there's more recent research, but in school I read this paper: <https://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa...> that seems to agree that x86 and ARM as instruction sets do not differ greatly in power consumption. They also found that GCC picks RISC-like instructions when compiling for x86 which meant the number of micro-ops was similar between ARM and x86, and that the x86 chips were optimized well for those RISC-like instructions and so were similarly efficient to ARM chips. They have a quote that "The microarchitecture, not the ISA, is responsible for performance differences."

[+] KingOfCoders|6 months ago|reply

"When RISC first came out, x86 was half microcode. So if you look at the die, half the chip is a ROM, or maybe a third or something. And the RISC guys could say that there is no ROM on a RISC chip, so we get more performance. But now the ROM is so small, you can’t find it. Actually, the adder is so small, you can hardly find it? What limits computer performance today is predictability, and the two big ones are instruction/branch predictability, and data locality."

Jim Keller

[+] tester756|6 months ago|reply

Of course, because saying that X ISA is faster than Y ISA is like saying that Java syntax is faster than C# syntax

Everything is about the implementation: compiler, JIT, runtime/VM, stdlib, etc.

https://chipsandcheese.com/p/arm-or-x86-isa-doesnt-matter

[+] nottorp|6 months ago|reply

Irrelevant.

There are two entities allowed to make x86_64 chips (and that only because AMD won the 64 bit ISA competition, otherwise there'd be only Intel). They get to choose.

The rest will use arm because that's all they have access to.

Oh, and x86_64 will be as power efficient as arm when one of the two entities will stop competing on having larger numbers and actually worry about power management. Maybe provide a ?linux? optimized for power consumption.

[+] slimginz|6 months ago|reply

IIRC There was a Jim Keller interview a few years ago where he said basically the same thing (I think it was from right around when he joined Tenstorrent?). The ISA itself doesn't matter, it's just instructions. The way the chip interprets those instructions is what makes the difference. ARM was designed from the beginning for low powered devices whereas x86 wasn't. If x86 is gonna compete with ARM (and RISC-V) then the chips are gonna need to also be optimized for low powered devices, but that can break decades of compatibility with older software.

[+] gigatexal|6 months ago|reply

You’ll pry the ARM M series chips of my Mac from my cold dead hands. They’re a game changer in the space and one of the best reasons to use a Mac.

I am not a chip expert it’s just so night and day different using a Mac with an arm chip compared to an Intel one from thermals to performance and battery life and everything in between. Intel isn’t even in the same ballpark imo.

But competition is good and let’s hope they both do —- Intel and AMD because the consumer wins.

[+] mort96|6 months ago|reply

I have absolutely no doubt in my mind that if Apple's CPU engineers got half a decade and a mandate from the higher ups, they could make an amazing amd64 chip too.

[+] kccqzy|6 months ago|reply

That's not mostly because of a better ISA. If Intel and Apple had a chummier relationship you could imagine Apple licensing the Intel x86 ISA and the M series chips would be just as good but running x86. However I suspect no matter how chummy that relationship was, business is business and it is highly unlikely that Intel would give Apple such a license.

[+] pengaru|6 months ago|reply

Your Intel mac was stuck in the past while everyone paying attention on PCs were already enjoying TSMC 7nm silicon in the form of AMD Zen processors.

Apple Silicon macs are far less impressive if you came from an 8c/16t Ryzen 7 laptop. Especially if you consider the Apple parts are consistently enjoying the next best TSMC node vs. AMD (e.g. 5nm (M1) vs. 7nm (Zen2))

What's _really_ impressive is how badly Intel fell behind and TSMC has been absolutely killing it.

[+] KingOfCoders|6 months ago|reply

I think everything depends on circumstances.

I've used laptops for 15+ years (transitioned from a Mac Cube to a white Macbook. Macbook Pro etc.) but have migrated to a desktop some years ago (first iMac Pro, now AMD), as I work at my desk and when I'm not at my desk I'm not working.

Some years ago I got a 3900X and a 2080TI. And they still work fine, and I don't have performance problems, and although I thought of getting PCI5/NVMe with a 9950x3d/395+ (or a Threadripper), I just don't need it. I've upgraded the SSDs several times for speed and capacity (now at the PCI4/M2 limit and don't want to go into RAID), and added solar panels and a battery pack for energy usage, but I'm fine otherwise.

Indeed I want to buy a new CPU and GPU, but I don't find enough reasons (though might get a Mac Studio for local AI).

But I understand your point if you need a laptop, I just decided I no longer need one, and get more power with faster compiling for less money.

[+] Avi-D-coder|6 months ago|reply

From what I have heard it's not the RISCy ISA per se, it's largely arm's weaker memory model.

I'd be happy to be corrected, but the empirical core counts seem to agree.

[+] variadix|6 months ago|reply

Instruction decode for variable length ISAs is inherently going to be more complex, and thus require more transistors = more power, than fixed length instruction decode, especially parallel decode. AFAIK modern x86 cores have to speculatively decode instructions to achieve this, compared to RISC ISAs where you know where all the instruction boundaries are and decoding N in parallel is a matter of instantiating N decoders that work in parallel. How much this determines the x86 vs ARM power gap, I don’t know, what’s much more likely is x86 designs have not been hyper optimized for power as much ARM designs have been over the last two decades. Memory order is another non-negligible factor, but again the difference is probably more attributable to the difference in goals between the two architectures for the vast majority of their lifespan, and the expertise and knowledge of the engineers working at each company.

[+] hereme888|6 months ago|reply

I was just window-shopping laptops this morning, and realized ARM-based doesn't necessarily hold battery life advantages.

[+] w4rh4wk5|6 months ago|reply

Ok. When will we get the laptop with AMD CPU that is on par with a Macbook regarding battery life?

[+] zuhsetaqi|6 months ago|reply

Don't claim it just show/proof it by offering a chip for consumers that matches or better beats the metrics of Apples offerings.

[+] WithinReason|6 months ago|reply

Since newer CPUs have heterogeneous cores (high performance + low power), I'm wondering if it makes sense to drop legacy instructions from the low power cores, since legacy code can still be run on the other cores. Then e.g. an OS compiled the right way can take advantage of extra efficiency without the CPU losing backwards compatibility

[+] toast0|6 months ago|reply

Like o11c says, that's setting everyone up for a bad time. If the heterogenous cores are similar, but don't all support all the instructions, it's too hard to use. You can build legacy instructions in a space optimized way though, but there's no reason not to do that for the high performance cores too --- if they're legacy instructions, one expects them not to run often and perf doesn't matter that much.

Intel dropped their x86-S proposal; but I guess something like that could work for low power cores. If you provide a way for a 64-bit OS to start application processors directly in 64-bit mode, you could setup low power cores so that they could only run in 64-bit mode. I'd be surprised if the juice is worth the squeeze, but it'd be reasonable --- it's pretty rare to be outside 64-bit mode, and systems that do run outside 64-bit mode probably don't need all the cores on a modern processor. If you're running in a 64-bit OS, it knows which processes are running in 32-bit mode, and could avoid scheduling them on reduced functionality cores; If you're running a 32-bit OS, somehow or another the OS needs to not use those cores... either the ACPI tables are different and they don't show up for 32-bit, init fails and the OS moves on, or the there is a firmware flag to hide them that must be set before running a 32-bit OS.

[+] devnullbrain|6 months ago|reply

Interesting but it would be pretty rough to implement. If you take a binary now and run it on a core without the correct instructions, it will SIGILL and probably crash. So you have these options:

Create a new compilation target

- You'll probably just end up running a lot of current x86 code exclusively on performance cores to a net loss. This is how RISC-V deals with optional extensions.

Emulate

- This already happens for some instructions but, like above, could quickly negate the benefits

Ask for permission

- This is what AVX code does now, the onus is on the programmer to check if the optional instructions can be used. But you can't have many dropped instructions and expect anybody to use it.

Ask for forgiveness

- Run the code anyway and catch illegal instruction exceptions/signals, then move to a performance core. This would take some deep kernel surgery for support. If this happens remotely often it will stall everything and make your system hate you.

The last one raises the question: which instructions are we considering 'legacy'? You won't get far in an x86 binary before running into an instruction operating on memory that, in a RISC ISA, would mean first a load instruction, then the operation, then a store. Surely we can't drop those.

[+] o11c|6 months ago|reply

We've seen CPU-capability differences by accident a few times, and it's always a chaotic mess leading to SIGILL.

The kernel would need to have a scheduler that knows it can't use those cores for certain tasks. Think about how hard you would have to work to even identify such a task ...

[+] pulkittt|6 months ago|reply

This interview with AMD Zen chief architect Mike Clark validated that the efficiency gains are not due to ISA https://www.computerenhance.com/p/an-interview-with-zen-chie...

[+] sylware|6 months ago|reply

RISC-V has no PI lock like ARM or x86 and x86_64.

RISC-V has to start to seriously defend itself, because it is a death sentence for ARM ISA and and could start to cast shadows on x86_64 in some areas slowly but surely. Some people will try to bring it down, hard.

If you stick to core rva22+ (core RISC-V ISA), RISC-V is good enough to replace all of them, without PI lock, and with a global standard ISA, software may have a chance to get out of the horrible mess it is currently in (a lot of critical software code path may end up assembly written... no compiler lock-in, extremely hard to do planned obsolescence, etc).

RISC-V is basically ARM ISA without PI lock.

I have been writting RISC-V assembly running on x86_64 with an interpreter for much of my software projects. It is very pleasant to code using it (basic assembler: no pseudo-instructions, I don't even use the compressed instructions).

I hope to get my hands on RISC-V performant implementations on near state-of-the-art silicon process some day (probably a mini-server, for all the self hosted stuff).

The 'silicon market' is saturated then it is amazing what the RISC-V supporters have been able to achieve. There will be mistakes (some probably big), before implementations do stabilize in the various domains (desktop/server/embedded/mobile/etc), and expect the others to press hard on them.

The next step for RISC-V would be a GPU ISA, and for RVAX, a standard hardware GPU programming interface... but it may be still too early for that since we kind of still don't know if we reach 'the sweet spot'.

[+] sylware|6 months ago|reply

Oh, regarding GPUs, AMD started to experiment on userland _hardware_ ring buffers... I don't know how they will handle their scarse VM id resources... the kernel may end up "only" mmaping event/command ring buffers and data dma/doorbells buffers with an "IRQ" event file descriptor.

We are talking a "near 0-driver"... but they will have to be very confident in their GPU robustness to do that, not to mention 3D pipeline programming from those userland _hardware_ buffers will have to be really simple and directly "ready" to work.

[+] thecosmicfrog|6 months ago|reply

What is "PI lock"? A cursory web search didn't reveal much.

[+] flembat|6 months ago|reply

That is quite a confession from AMD. It's not X86 at all, just every implementation. It is not like the ARM processors in Macs are simple any more, thats for sure.

[+] ZuLuuuuuu|6 months ago|reply

There are a lot of theoretical articles which claim similar things but on the other hand we have a lot of empirical evidence that ARM CPUs are significantly more power efficient.

I used laptops with both Intel and AMD CPUs, and I read/watch a lot of reviews in thin and light laptop space. Although AMD became more power efficient compared to Intel in the last few years, AMD alternative is only marginally more efficient (like 5-10%). And AMD is using TSMC fabs.

On the other hand Qualcomm's recent Snapdragon X series CPUs are significantly more efficient then both Intel and AMD in most tests while providing the same performance or sometimes even better performance.

Some people mention the efficiency gains on Intel Lunar Lake as evidence that x86 is just as efficient, but Lunar Lake was still slightly behind in battery life and performance, while using a newer TSMC process node compared to Snapdragon X series.

So, even though I see theoretical articles like this, the empirical evidence says otherwise. Qualcomm will release their second generation Snapdragon X series CPUs this month. My guess is that the performance/efficiency gap with Intel and AMD will get even bigger.

[+] ryukoposting|6 months ago|reply

I think both can be true.

A client CPU spends most of its life idling. Thus, the key to good battery life in client computing is, generally, idle power consumption. That means low core power draw at idle, but it also means shutting off peripherals that aren't in use, turning off clock sources for said peripherals, etc.

ARM was built for low-power embedded applications from the start, and thus low-power idle states are integrated into the architecture quite elegantly. x86, on the other hand, has the SMM, which was an afterthought.

AFAICT case for x86 ~ ARM perf equivalence is based on the argument that instruction decode, while empirically less efficient on x86, is such a small portion of a modern, high-performance pipeline that it doesn't matter. This reasoning checks out IMO. But, this effect would only be visible while the CPU is under load.

[+] cptskippy|6 months ago|reply

The ISA is the contract or boundary between software and hardware. While there is a hardware cost to decode instructions, the question is how much?

As all the fanbois in the thread have have pointed out, Apple's M series is fast and efficient compared to x86 for desktop/server workloads. What no one seems to acknowledge is that Apple's A series is also fast and efficient compared to other ARM implementations in mobile workloads. Apple sees the need to maintain M and A series CPUs for different workloads, which indicates there's a benefit to both.

This tells me the ISA decode hardware isn't or isn't the only bottleneck.

429 comments