Big non-trivial functions generally profit from having more registers. They don't have to saved and restored by functions that don't use them. Not every computationally intense application has just a few numerical kernels that can be pushed on to GPUs and fit the GPU uarch well. Just have a look at the difference between optimized compiler generated ia32 and amd64 code how much difference more registers can make.You could argue that 16 (integer) registers is a sweet spot, but you failed to state a proper argument.
No comments yet.