top | item 43658760

(no title)

keldaris | 10 months ago

Technically, OpenCL can also include inline PTX assembly in kernels (unlike any compute shader API I've ever seen), which is relevant for targeting things like tensor cores. You're absolutely right about the language limitation, though.

discuss

pjmlp|10 months ago

At which point why bother, PTX is CUDA.

keldaris|10 months ago

Generally, the reason to bother with this approach is if you have a project that only needs tensor cores in a tiny part of the code and otherwise benefits from the cross platform nature of OpenCL, so you have a mostly shared codebase with a small vendor-specific optimization in a kernel or two. I've been in that situation and do find that approach valuable, but I'll be the first to admit the modern GPGPU landscape is full of unpleasant compromises whichever way you look.