top | item 40007920

(no title)

keldaris | 1 year ago

It depends on what you're doing. For writing FP32 number crunching code from scratch (meaning you don't care about something like Torch, or even cuBLAS/cuDNN), I haven't encountered cases where I couldn't match CUDA performance and if I did, I could always just use a bit of PTX assembly where absolutely necessary (which OpenCL lets you do, whereas Vulkan does not). This also gets me good performance on MacOS without rewriting the whole thing in Metal. There is no native FP16 support and there are other limitations that may matter to your usecase or be completely irrelevant.

I'm definitely not saying OpenCL is any sort of a reasonable default for cross platform GPGPU work. In truth, I don't think there is any reasonable "general" default for that sort of thing. Vulkan has its own issues (only works via a compatibility layer on MacOS, implementation quality varies widely, extension hell, boilerplate hell, some low level things are just impossible, etc.) and everything else is a higher level approach that can't work for everything by definition.

It's a pretty sad situation overall and every solution has severe tradeoffs. Personally, I just write CUDA when I can get away with it and try to stick to OpenCL otherwise, but everyone needs to make that choice for their own set of tradeoffs.

discuss

VHRanger|1 year ago

Yeah, TBH I'm kind of sad about where OpenCL ended up, because it "should have" been what CUDA was used for from 2011-2021. AlexNet, TF, Pytorch, etc. "should have" been written with OpenCL backends.

But the driver implementations inconsistency, version support issues, etc. meant people used CUDA instead.

I agree Vulkan has its own issues, and having written some MoltenVK stuff, you clearly know the quality-of-life pains in developping with it. That said, at least from the user side it works and performs well.