(no title)
jorvi | 11 days ago
AVX512 is so kludgy that it usually leads to a detriment in performance due to the extreme power requirements triggering thermal throttling.
jorvi | 11 days ago
AVX512 is so kludgy that it usually leads to a detriment in performance due to the extreme power requirements triggering thermal throttling.
kimixa|11 days ago
But there's plenty in avx512 the really helps real algorithms outside the 512-wide registers - I think it would be perceived very differently if it was initially the new instructions on the same 256-wide registers - ie avx10 - in the first place, then extended to 512 as the transistor/power budgets allowed. AVX512 was just tying too many things together too early than "incremental extensions".
otherjason|11 days ago
AVX512 leading to thermal throttling is a common myth that from what I can tell traces its origins to a blog post about clock throttling on a particular set of low-TDP SKUs from the first generation of Xeon CPUs that supported it (Skylake-X), released over a decade ago: https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...
The results were debated shortly after that by well-known SIMD authors that were unable to duplicate the results: https://lemire.me/blog/2018/08/25/avx-512-throttling-heavy-i...
In practice, this has not been an issue for a long time, if ever; clock frequency scaling for AVX modes has been continually improved in subsequent Intel CPU generations (and even more so in AMD Zen 4/5 once AVX512 support was added).
adrian_b|11 days ago
All AMD Zen 4 and Zen 5 and all of the Intel CPUs since Ice Lake that support AVX-512, benefit greatly from using it in any application.
Moreover the AMD Zen CPUs have demonstrated clearly that for vector operations the instruction-set architecture really matters a lot. Unlike the Intel CPUs, the AMD CPUs use exactly the same execution units regardless whether they execute AVX2 or AVX-512 instructions. Despite this, their speed increases a lot when executing programs compiled for AVX-512 (in part for eliminating bottlenecks in instruction fetching and decoding, and in part because the AVX-512 instruction set is better designed, not only wider).
badgersnake|11 days ago
SecretDreams|11 days ago
corysama|11 days ago
So, in order to make use of users new fancy hardware without abandoning other users old and busted hardware, you have to support multiple back-ends. Same as it ever was.
Actually, a lot easier than it ever was today. Doom 3 famously required Carmack to reimplement the rendering 6 times to get the same results out of 6 different styles of GPUs that were popular at the time.
ARB Basic Fallback (R100) Multi-pass Minimal effects, no specular.
NV10 GeForce 2 / 4 MX, 5 Passes, Used Register Combiners.
NV20 GeForce 3 / 4 Ti, 2–3 Passes, Vertex programs + Combiners.
R200 Radeon 8500–9200, 1 Pass, Used ATI_fragment_shader.
NV30 GeForce FX Series, 1 Pass, Precision optimizations (FP16).
ARB2 Radeon 9500+ / GF 6+, 1 Pass, Standard high-end GLSL-like assembly.
https://community.khronos.org/t/doom-3/37313