top | item 22873708

(no title)

chapplap | 5 years ago

That's not exactly true once you have to deal with vector extensions like AVX-512. It's quite a pain to write by hand (C intrinsics) and many of the ways to abstract it away end up giving you a GPU-like programming model (eg. Intel ISPC).

Plus, this has largely been tried before with Xeon Phi and it didn't end so well.

Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.

discuss

imtringued|5 years ago

>Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.

There are a lot of tasks that a GPU can do faster than a CPU but it would require batching a large amount of work before you can gain a speedup. EYPC CPUs do not suffer that limitation. If all you have is an array with 4 elements you can straight up run the vector instructions and then immediately switch back to scalar code. Meanwhile with a GPU you probably need at least an array with 10000 elements or more.

jiggawatts|5 years ago

> It's quite a pain to write by hand

And we all know that autovectorisation is hit-and-miss at best.

I wonder if there will be a new C-like language that has portable SIMD-like capabilities in the same sense that "C is a portable assembly language".

DeathArrow|5 years ago

>The first observation is that in modern compilers, the resulting performance from auto-vectorization optimization is still far from the architectural peak performance. https://dl.acm.org/doi/fullHtml/10.1145/3356842

Maybe we can get get more benefits if we invest more resources in optimizing compilers than in inventing yet another Javascript framework?

Fronzie|5 years ago

Another one is SyCL: https://www.khronos.org/sycl/ the SIMD part is OpenCL, SyCL provides scheduling/memory transfer on top.

mmozeiko|5 years ago

There is - Intel SPMD Program Compiler. https://ispc.github.io/