top | item 33780276

(no title)

bXVsbGVy | 3 years ago

> It takes literally 1 to 10 microseconds (10,000 nanoseconds) to talk to a GPU over PCIe.

The CPU has an overhead of about ~10us to enable the AVX512 units.

It also dramatically reduces the clock on other cores.

For more information, see: https://stackoverflow.com/a/56861355

Timing information: https://www.agner.org/optimize/microarchitecture.pdf

discuss

order

dragontamer|3 years ago

Skylake-X is a processor that's 7 years old. Intel's first implementation always kinda sucks, but the newer implementations have no such restrictions.

Its all about AMD Zen4 or Xeon Ice Lake+, which has no clock reduction and no overheads.

bXVsbGVy|3 years ago

From microarchitecture.pdf

On Alder Lake (pg 172).

> The reader is referred to the timings for Tiger Lake and Gracemont.

On Tiger Lake (pg 167):

> Warm-up period for ZMM vector instructions

> The processor puts the upper parts of the 512 bit vector execution units into a low power mode when they are not used.

> Instructions with 512-bit vectors have a throughput that is approximately 4.5 times slower than normal during an initial warm-up period of approximately 50,000 clock cycles.

I'm not saying you are wrong. I just haven't heard about that.