(no title)
physicsguy | 1 month ago
for (int i = 0; i < N; i += SIMD_WIDTH) {
for (int j = 0; j < SIMD_WIDTH) {
// do code
}
}
but failing the compiler optimising that you can do it more like: for(int i = 0; i < N; i+= SIMD_WIDTH) {
float mask[8];
// do work into mask, find max of the mask
}
That's effectively what you're doing anyway in the SIMD code, but it keeps it more readable for mere mortals, and because you can define SIMD_WIDTH as a constant, it's also slightly easier to change if a new instruction set comes along; you're not maintaining multiple kernels.
No comments yet.