(no title)
chapplap | 5 years ago
Plus, this has largely been tried before with Xeon Phi and it didn't end so well.
Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.
imtringued|5 years ago
There are a lot of tasks that a GPU can do faster than a CPU but it would require batching a large amount of work before you can gain a speedup. EYPC CPUs do not suffer that limitation. If all you have is an array with 4 elements you can straight up run the vector instructions and then immediately switch back to scalar code. Meanwhile with a GPU you probably need at least an array with 10000 elements or more.
jiggawatts|5 years ago
And we all know that autovectorisation is hit-and-miss at best.
I wonder if there will be a new C-like language that has portable SIMD-like capabilities in the same sense that "C is a portable assembly language".
DeathArrow|5 years ago
Maybe we can get get more benefits if we invest more resources in optimizing compilers than in inventing yet another Javascript framework?
Fronzie|5 years ago
mmozeiko|5 years ago