You have widening operations e.g. 16x16->32 bit multiplications and can reduce number of available registers to get longer vectors, but among the really interesting ones are fault only first load and masked instructions that enable the vector unit to work on things like null terminated strings. The specification includes vectorized strlen/strcmp/strcpy/strncpy implementations as examples. Most existing (packed) SIMD instruction sets aren't useful for these common functions.
crest|4 years ago
jcranmer|4 years ago