top | item 47016138

(no title)

csjh | 15 days ago

You’re missing the fact that the compiler isn’t forced to fill every register in the first place. If it was less efficient to use more registers, the compiler simply wouldn’t use more registers.

The actual counter proof here would be that in either case, the temporaries have to end up on the stack at some point anyways, so you’d need to look at the total number of loads/stores in the proximity of the call site in general.

discuss

Joker_vD|15 days ago

> You’re missing the fact that the compiler isn’t forced to fill every register in the first place.

Temporaries start their lives in registers (on RISCs, at least). So if you have 40 alive values, you can use the same one register to calculate them all and immediately save all 40 of them on the stack, or e.g. keep 15 of them in 15 registers, and use the 16th register to compute 25 other values and save those on the stack. But if you keep them in the call-invariant registers, those registers need to be saved at the function's prologue, and the call-clobbered registers need to be saved and restored around inner call sites. That's why academia has been playing with register windows, to get around this manual shuffling.

> The actual counter proof here would be that in either case, the temporaries have to end up on the stack at some point anyways, so you’d need to look at the total number of loads/stores in the proximity of the call site in general.

Would you be willing to work through that proof? There may very well be less total memory traffic for machine with 31 registers than with 16; but it would seem to me that there should be some sort of local optimum for the number of registers (and their clobbered/invariant assignment) for minimizing stack traffic: four registers is way too few, but 192 (there's been CPUs like that!) is way too many.

adrian_b|13 days ago

32 registers have been used in many CPUs since the mid seventies until today, i.e. for a half of century.

During all this time there has been a consensus that 32 registers are better than less registers.

A few CPUs, e.g. SPARC and Itanium have tried to use more registers than that, and they have been considered unsuccessful.

There have been some inconclusive debates about whether 64 architectural registers might be better than 32 registers, but the fact the 32 are better than 16 has been pretty much undisputed. So there are chances that 32 is close to an optimal number of GPRs.

Using 32 registers is something new only for Intel-AMD, due to their legacy, but it is something that all their competition has used successfully for many decades.

I have written many assembly language programs for ARM, POWER or x86. Whenever I had 32 registers, this made it much easier for me to avoid most memory accesses, for spilling or for other purposes. It is true that a compiler is dumber than a human programmer and it frequently has to adhere to a rigid ABI, but even so, I expect that on average even a compiler will succeed to reduce the register spilling when using 32 registers.