top | item 47057545

(no title)

AS I know GPUs execute code pretty fast as long as all threads in a warp go the same execution path. Branching causes performance degradation. But executing exactly the same code for multiple coroutines seems for me to be practically impossible. So, can good performance be reached with such approach at all?

discuss

zozbot234|11 days ago

GPU 'threads' are SIMD lanes and this post does not discuss those. It's about running multiple GPU 'warps', i.e. hardware threads with separate control flow and code execution.

(Beyond that, "executing the same code" on multiple instances of a single coroutine ought to be sometimes possible on an opportunistic basis.)

ablob|12 days ago

The short answer is yes. The post literally has an example of co-routines (think C-style: possible, but ugly). The difference here is how easy it is to write. I'd wager the question is not if it can be achieved, but for which use cases it can be ergonomic.