top | item 36630014 (no title) fuber2018 | 2 years ago If I convert the unrolled-64-bit SWAR function to use 32-bit chunks instead, average runtime almost doubles, approx. 0.1s now.Need sleep now. discuss order hn newest fuber2018|2 years ago If I unroll the 64-bit SWAR version by 8x instead of 4x, the runtime is reduced by another 10% over the 4x-unrolled SWAR version. Diminishing returns...
fuber2018|2 years ago If I unroll the 64-bit SWAR version by 8x instead of 4x, the runtime is reduced by another 10% over the 4x-unrolled SWAR version. Diminishing returns...
fuber2018|2 years ago