top | item 40666033

(no title)

glangdale | 1 year ago

8 instructions seems very solid - guessing AND/PSHUFB for low nibble, SHIFT/AND/PSHUFB for high nibble, OR to combine plus load/store?

If you have AVX-512, GFNI is faster for this task, but obviously many situations where you can't use it.

discuss

order

Const-me|1 year ago

> guessing AND/PSHUFB for low nibble, SHIFT/AND/PSHUFB for high nibble, OR to combine

Yeah, that’s exactly what I did in my C++ code with intrinsics.

About ISA extensions, I’m lucky to work on a professional CAM/CAE software. We have specified AVX2 in the system requirements, I’m guaranteed to have the support on our customer’s computers. However, very few of them have AVX512 CPUs so we are ignoring that thing so far.