top | item 30904135

(no title)

marcan_42 | 3 years ago

> which are programmed somewhat like regular computers but just with an astonishingly high number of threads.

But they aren't that; they are actually wide vector processors, which means groups of threads need to be doing the same thing for it to perform properly! Branches and divergent control flow kill GPU performance.

I'm sure you already know this, but I'm just pointing out for other folks reading. If GPUs were just CPUs with stupidly high core counts then things would be way easier, but it's more complicated than that.

discuss

derefr|3 years ago

But any Turing-complete operation can be mapped mechanistically into a branchless ISA, can’t it? One of those “one-instruction” ISAs, for example, where every instruction is also a jump. Vector processors would compute on those just fine, just like they compute matrix-multiplication problem isomorphisms just fine.

Or, for a more obvious/less arcane restatement: can't the shader cores just be given a shader that's an interpreter, and a texture that's a spritesheet of bytecode programs?

Jasper_|3 years ago

Yes, we can make GPU programs that render vector images this way, but they tend to be slower than an equivalent CPU program. Branches are not the problem, GPUs handle those just fine now actually. The problem is duplicated work. GPUs have cores that are individually much, much slower than a CPU, but make up for this by having lots and lots of them running in parallel. Having those cores all run the same serial interpreter does not give you increased parallelism, so the result is slower.

Designing algorithms for the GPU requires rethinking your dataflow and structure to exploit the parallel nature of the GPU. GPUs are not just a "go fast" button.