(no title)
RaisingSpear | 2 years ago
PMULUDQ is in SSE2, though I haven't checked if that's usable for the problem here. There's also PMULLD in SSE4.1 if you only need a 32-bit result. But for summing digits, perhaps SSE2's PMADDWD could be sufficient?
dzaima|2 years ago
Completely forgot about pmuludq, that works too for SSE2. But a 32-bit result is insufficient for the magic number method, needs to be at least 36-bit. I originally used vpmaddubsw+vpmaddwd, but switched to vpmuldq for the reduced register pressure, and I was already only parsing 4 numbers in ymm registers so the 64-bit result didn't affect me (after parsing 4 temperatures I immediately did the respective hashmap stuff for each).