(no title)
bXVsbGVy | 3 years ago
The CPU has an overhead of about ~10us to enable the AVX512 units.
It also dramatically reduces the clock on other cores.
For more information, see: https://stackoverflow.com/a/56861355
Timing information: https://www.agner.org/optimize/microarchitecture.pdf
dragontamer|3 years ago
Its all about AMD Zen4 or Xeon Ice Lake+, which has no clock reduction and no overheads.
bXVsbGVy|3 years ago
On Alder Lake (pg 172).
> The reader is referred to the timings for Tiger Lake and Gracemont.
On Tiger Lake (pg 167):
> Warm-up period for ZMM vector instructions
> The processor puts the upper parts of the 512 bit vector execution units into a low power mode when they are not used.
> Instructions with 512-bit vectors have a throughput that is approximately 4.5 times slower than normal during an initial warm-up period of approximately 50,000 clock cycles.
I'm not saying you are wrong. I just haven't heard about that.