top | item 32681972

(no title)

aqrit | 3 years ago

The AVX2 version is using a 8 KiB table...? Even the range check is inefficient. I'd bet that the AVX2 version could be 50% faster.

discuss

order

becurious|3 years ago

Why do the 255-bitmask? You’re going to look it up in a table anyway. I’m pretty sure I’ve implemented the compact operation before in AVX-256 and not needed a massive table and I don’t think you need a gather either but that was at a previous job. You can do a lot with the PEXT / PDEP in combination with multiplies to make masks.