top | item 47021391

(no title)

dahart | 15 days ago

Okay, I see what you’re saying. I was assuming the compiler or programmer knows the call graph, and you’re assuming it’s a function call in the middle of a potentially large call stack with no knowledge of its surroundings. Your assumption is for sure safer and more common for a compiler compiling a function that’s not a leaf and not inlined.

So I can see why it might seem at first glance like having more registers would mean more spilling for a single function. But if your requirement is that you must save/spill all registers used, then isn’t the amount of spilling purely dependent on the function’s number of simultaneous live variables, and not on the number of hardware registers at all? If your machine has fewer general purpose registers than live state footprint in your function, then the amount of function-internal spill and/or remat must go up. You have to spill your own live state in order to compute other necessary live state during the course of the function. More hardware registers means less function-internal spill, but I think under your function call assumptions, the amount of spill has to be constant.

For sure this topic makes it clear why inlining is so important and heavily used, and once you start talking about inlining, having more registers available definitely reduces spill, and this happens often in practice, right? Leaf calls and inlined call stacks and specialization are all a thing that more regs help, so I would expect perf to get better with more registers.

discuss

order

Joker_vD|15 days ago

Thanks for actually engaging with my argument.

> assuming it’s a function call in the middle of a potentially large call stack with no knowledge of its surroundings.

Most of the decision logic/business logic lives exactly in functions like this, so while I wouldn't claim that 90% of all of the code is like that... it's probably at least 50% or so.

> then isn’t the amount of spilling purely dependent on the function’s number of simultaneous live variables

Yes, and this ties precisely back to my argument: whether or not larger number of GPRs "helps" depends on what kind of code is usually being executed. And most of the code, empirically, doesn't have all that many scalar variables alive simultaneously. And the code that does benefit from more registers (huge unrolled/interleaved computational loops with no function calls or with calls only to intrinsics/inlinable thin wrappers of intrinsics) would benefit even more from using SIMD or even better, being off-loaded to a GPU or the like.

I actually once designed a 256-register fantasy CPU but after playing with it for a while I realised that about 200 of its registers go completely unused, and that's with globals liberally pinned to registers. Which, I guess, explains why Knuth used some idiosyncratic windowing system for his MMIX.

dahart|14 days ago

It took me a minute, but yes I completely agree that whether more GPRs helps depends on the code & compiler, and that there’s plenty of code you can’t inline. Re: MMIX Yes! Theoretically it would help if the hardware could dynamically alias registers, and automatically handle spilling when the RF is full. I have heard such a thing physically exists and has been tried, but I don’t know which chip/arch it is (maybe AMD?) nor how well it works. I would bet that it can’t be super efficient with registers, and maybe the complexity doesn’t pay off in practice because it thwarts and undermines inlining.