top | item 33031479

(no title)

cylon13 | 3 years ago

There are fundamental issues with x86 that make it impossible to match the efficiency of ARM. Variable length instruction coding for instance, which means a surprising amount of power is dedicated to circuitry which is just to find where the instruction boundaries are for speculative execution. Made sense in the 80s when memory was scarce and execution was straightforward, but now it’s a barrier to efficiency that’s baked right into the ISA.

discuss

order

dontlaugh|3 years ago

And sadly this partly applies to RISC-V too. It only achieves competitive density with the (optional) instruction compression, which makes instructions vary in length. Not as big of a problem as on x86, but still a fundamental limitation.

snvzz|3 years ago

>sadly this partly applies to RISC-V too.

Not in any way that has any relevance.

>Not as big of a problem as on x86, but still a fundamental limitation.

Huge understatement. Instructions being any size 1-16 (x86) vs being either 16bit or 32bit long (RISC-V).

As with everything else in RISC-V, the architects did the weighting, and found that the advantage in code size overwhelms the (negligible by design) added decoding cost, for anything but the tiniest of implementations (no on-die cache + no builtin ROM).

As it turns out, it would be difficult to even find a use for such a core, but in any event it is still possible to make one such very specialized chip, and simply not use the C extension.

Such a use would be deeply embedded, and the vendor would be in control of the full stack so there would be no concerns of compatibility with e.g. mainstream Linux distributions. They would still get ecosystem benefits; they'd be able to use the open source toolchains, as they support even naked RV32E with no extensions.

snvzz|3 years ago

>Variable length instruction coding for instance, which means a surprising amount of power is dedicated to circuitry which is just to find where the instruction boundaries are for speculative execution.

This does apply to x86 and m68k, as "variable" there means 1-16 byte, and dealing with that means bruteforcing decode at every possible starting point. Intel and AMD have both thus found 4-wide decode to be a practical limit.

It does not apply to RISC-V, where you get either 32bit or 2x 16bit. The added complexity of using the C extension is negligible, to the point where if a chip has any cache or rom in it, using C becomes a net benefit in area and power.

Therefore, ARMv8 AArch64 made a critical mistake in adopting a fixed 32bit opcode size. A mistake we can see in practice when looking at the L1 cache size that Apple M1 needed to compensate for poor code density.

L1 is never free. It is always *very* costly: Its size dictates area the cache takes, clocks the cache itself can achieve (which in turns caps the speed of the CPU), and power the cache draws.

renox|3 years ago

Maybe. If I remember well, Apple ARM M1 can decode up to 8 instruction at the same time, is-there any RISC-V CPU with the C extension which is able to decode 8 instructions?

withinboredom|3 years ago

That sounds false to me. If that were solely the case, then the CPU could pad the instructions coming in and pretend they are all the same length.

POPOSYS|3 years ago

Thank you very much for this interesting comment! It would be great if you would like to provide an URL with a detailed analysis of this issue.