top | item 47014659

(no title)

fuhsnn | 16 days ago

Intel's next gen will add 16 more general purpose registers. Can't wait for the benchmarks.

discuss

Joker_vD|16 days ago

So every function call will need to spill even more call-clobbered registers to the stack!

Like, I get that leaf functions with truly huge computational cores are a thing that would benefit from more ISA-visible registers, but... don't we have GPUs for that now? And TPUs? NPUs? Whatever those things are called?

cvoss|16 days ago

With an increase in available registers, every value that a compiler might newly choose to keep in a register was a value that would previously have lived in the local stack frame anyway.

It's up to the compiler to decide how many registers it needs to preserve at a call. It's also up to the compiler to decide which registers shall be the call-clobbered ones. "None" is a valid choice here, if you wish.

jandrewrogers|16 days ago

Most function calls are aggressively inlined by the compiler such that they are no longer "function calls". More registers will make that even more effective.

adrian_b|14 days ago

ARM Aarch64, IBM POWER and most RISC CPUs have had 32 general-purpose registers for many decades, and this has always been an advantage for them, not a disadvantage.

So there already is plenty of experience with ABIs using 32 GPRs.

Even so, unnecessary spilling could be avoided, but that would require some changes in programmer habits. For instance, one could require an ordering of the compilation of source files, e.g. to always compile the dependencies of a program module before it. Then the compiler could annotate the module interfaces with register usage and then use this annotations when compiling a program that imports the modules. This would avoid the need to define an ABI that is never optimal in most cases, by either saving too many or too few registers.

throwaway17_17|16 days ago

Why does having more more registers lead to spilling? I would assume (probably) incorrectly, that more registers means less spill. Are you talking about calls inside other calls which cause the outer scope arguments to be preemptively spilled so the inner scope data can be pre placed in registers?

crest|14 days ago

Big non-trivial functions generally profit from having more registers. They don't have to saved and restored by functions that don't use them. Not every computationally intense application has just a few numerical kernels that can be pushed on to GPUs and fit the GPU uarch well. Just have a look at the difference between optimized compiler generated ia32 and amd64 code how much difference more registers can make.

You could argue that 16 (integer) registers is a sweet spot, but you failed to state a proper argument.

bjourne|16 days ago

Most modern compilers for modern languages do an insane amount of inlining so the problem you're mentioning isn't a big issue. And, basically, GPUs and TPUs can't handle branches. CPUs can.

woadwarrior01|16 days ago

I looked it up. It's called APX (Advanced Performance Extensions)[1].

[1]: https://www.intel.com/content/www/us/en/developer/articles/t...

nxobject|15 days ago

Oh my, it has three-operand instructions now. VAX vindicated...

BobbyTables2|16 days ago

How are they adding GPRs? Won’t that utterly break how instructions are encoded?

That would be a major headache — even if current instruction encodings were somehow preserved.

It’s not just about compilers and assemblers. Every single system implementing virtualization has a software emulation of the instruction set - easily 10k lines of very dense code/tables.

Joker_vD|16 days ago

The same way AMD added 8 new GPRs, I imagine: by introducing a new instruction prefix.

toast0|16 days ago

x86 is broadly extendable. APX adds a REX2 prefix to address the new registers, and also allows using the EVEX prefix in new ways. And there's new conditional instructions where the encoding wasn't really described on the summary page.

Presumably this is gated behind cpuid and/or model specific registers, so it would tend to not be exposed by virtualization software that doesn't support it. But yeah, if you decode and process instructions, it's more things to understand. That's a cost, but presumably the benefit outweighs the cost, at least in some applications.

It's the same path as any x86 extension. In the beginning only specialty software uses it, at some point libraries that have specialized code paths based on processor featurses will support it, if it works well it becomes standard on new processors, eventually most software requires it. Or it doesn't work out and it gets dropped from future processors.

vaylian|16 days ago

Those general purpose registers will also need to grow to twice their size, once we get our first 128bit CPU architecture. I hope Intel is thinking this through.

SAI_Peregrinus|16 days ago

That's a ways out. We're not even using all bits in addresses yet. Unless they want hardware pointer tagging a la CHERI there's not going to be a need to increase address sizes, but that doesn't expose the extra bits to the user.

Data registers could be bigger. There's no reason `sizeof int` has to equal `sizeof intptr_t`, many older architectures had separate address & data register sizes. SIMD registers are already a case of that in x86_64.

dlcarrier|16 days ago

Doubling the number of bits squares the number range that can be stored, so there's a point of diminishing returns.

* Four-bit processors can only count to 15,or from -8 to 7, so their use has been pretty limited. It is very difficult for them to do any math, and they've mostly been used for state machines.

* Eight-bit processors can count to 255, or from -128 to 127, so much more useful math can run in a single instruction, and they can directly address hundreds of bytes of RAM, which is low enough an entire program still often requires paging, but at least a routines can reasonably fit in that range. Very small embedded systems still use 8-bit processors.

* Sixteen-bit processors can count to 65,535, or from -32,768 to 32,767, allowing far more math to work in a single instruction, and a computer can have tens of kilobytes of RAM or ROM without any paging, which was small but not uncommon when sixteen-bit processors initially gain popularity.

* Thirty-two-bit processors can count to 4,294,967,295, or from -2,147,483,648 to 2,147,483,647, so it's rare to ever need multiple instructions for a single math operation, and a computer can address four gigabytes of RAM, which was far more than enough when thirty-two-bit processors initially gain popularity. The need for more bits in general-purpose computing plateaus at this point.

* Sixty-four-bit processors can count to 18,446,744,073,709,551,615, or from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807, so only special-case calculations need multiple instructions for a single math operation, and a computer can address up to sixteen zettabytes of RAM, which is thousands of times more than current supercomputers use. There's so many bits that programs only rarely perform 64-bit operations, and 64-bit instructions are often performing single-instruction-multiple-data operations that use multiple 8-, 16-, or 32-bit numbers stored in a single register.

We're already at the point where we don't gain a lot from true 64-bit instructions, with the registers being more-so used with vector instructions that store multiple numbers in a single register, so a 128-bit processor is kind of pointless. Sure, we'll keep growing the registers specific to vector instructions, but those are already 512-bits wide on the latest processors, and we don't call them 512-bit processors.

Granted, before 64-bit consumer processors existed, no one would have conceived that simultaneously running a few chat interfaces, like Slack and Discord, while browsing a news web page, could fill up more RAM than a 32-bit processor can address, so software using zettabytes of RAM will likely happen as soon as we can manufacture it, thanks to Wirth's Law (https://en.wikipedia.org/wiki/Wirth%27s_law), but until then there's no likely path to 128-bit consumer processors.

rwmj|16 days ago

There's a first time for everything.