(no title)
rcgorton | 3 years ago
Another divot: asymmetric functional units. Some versions of Alpha supported a PopCount instruction, but it only worked in a single functional unit, which made scheduling a pain, esp. if you had to write in assembly language.
I'm not convinced that AVX 256 and AVX 512 are useful for non-matrix operations. Most strings (more importantly, parsing bounded by whitespace) are much shorter than 512 bits (32 bytes). In English, I cannot come up with many words longer than 16 bytes (some place names, antidisestablishmentarianism, chemical compound names, and some other stuff)
loup-vaillant|3 years ago
I've observed that compared to regular x86-64 code without SIMD, using AVX 256 speeds up the Chacha20 cipher (for long messages so they can be processed in 512-bytes chuncks (8 blocks)) by a factor of 5. Network packets easily exceed 1KB, and files are usually much bigger.
Matrix operations aren't the only viable niche.
sitkack|3 years ago
https://simdjson.org/