(no title)
ImprobableTruth | 11 months ago
b = foo(a)
c = bar(b)
d = baz(c)
synchronize()
With coroutines/async await, something like this b = await foo(a)
c = await bar(b)
d = await baz(c)
would synchronize after every step, being much more inefficient.
hackernudes|11 months ago
halter73|11 months ago
alanfranz|11 months ago
ImprobableTruth|11 months ago
The 'trick' for CUDA is that you declare all this using buffers as inputs/outputs rather than values and that there's automatic ordering enforcement through CUDA's stream mechanism. Marrying that with the coroutine mechanism just doesn't really make sense.