(no title)
dsharlet | 2 years ago
What about algorithms where register pressure is an issue?
I think the problem with LMUL is it assumes that you always want to unroll the innermost dimension (where the vector loads are stride 1). That's usually, the last dimension I try to unroll, if there are any registers left over. If there is any sharing of data across any other dimension in the algorithm, it's better to tile/unroll those first.
Of course, for a simple algorithm, there will be registers left over. But I think more interesting algorithms will struggle on RVV if you must use LMUL > 1 for performance.
adgjlsfhk1|2 years ago
camel-cdr|2 years ago
Then you'll probably saturate the processor without using a larger LMUL, but I think many algorithms can work with LMUL=2, without running out of registers.
brucehoult|2 years ago
Being able to use LMUL as a way to get the effect of unrolling and hide the pointer bumps and loop control on simple loops on narrow processors, without expanding the code, is just a bonus.