top | item 43314066

(no title)

newgre | 11 months ago

Why did the compiler even chose to fetch DWORDs only in the first place? It's unclear to me why the accumulator (apparently) determines the vectorization width?

discuss

order

TinkersW|11 months ago

The accumulator is a vector type, with 64 bit sum you can only fit 4 into a 256 bit register.

After the loop it will do a horizontal add across the vector register to produce the final scalar result.