top | item 40007286

(no title)

keldaris | 1 year ago

While this is completely true, it is also true that OpenCL 1.2 is the one compute API that just works on every major platform and the drivers don't seem that unusably bad (though I'm not claiming experience of every platform here, just Nvidia on Windows/Linux and Apple Silicon on MacOS). Writing a limited dialect of C and sticking to 1.2 limitations is far from ideal, but it does at least work reliably. Sadly, that is more than can be said about most competitors.

discuss

VHRanger|1 year ago

Yes, 1.2 is OK.

The problem is that the drivers are merely OK. Presumably if you're using OpenCL you care about the performance (otherwise why would you??) and since that's the case, it's the best on no platforms, and there are alternatives for any set of platforms that do better.

I think OpenCL is sadly on its way out, and it's mostly Apple's fault (and Nvidia a little). Vulkan compute is much more interesting if you're looking to leverage iGPUs/mobile/other random CPUs.

If you're targetting workstations/server workloads only, it makes sense to restrict yourself to a subset of accelerator types and code for that (eg. Torch or JAX for GPUs, use highway for SIMD, etc.)

keldaris|1 year ago

It depends on what you're doing. For writing FP32 number crunching code from scratch (meaning you don't care about something like Torch, or even cuBLAS/cuDNN), I haven't encountered cases where I couldn't match CUDA performance and if I did, I could always just use a bit of PTX assembly where absolutely necessary (which OpenCL lets you do, whereas Vulkan does not). This also gets me good performance on MacOS without rewriting the whole thing in Metal. There is no native FP16 support and there are other limitations that may matter to your usecase or be completely irrelevant.

I'm definitely not saying OpenCL is any sort of a reasonable default for cross platform GPGPU work. In truth, I don't think there is any reasonable "general" default for that sort of thing. Vulkan has its own issues (only works via a compatibility layer on MacOS, implementation quality varies widely, extension hell, boilerplate hell, some low level things are just impossible, etc.) and everything else is a higher level approach that can't work for everything by definition.

It's a pretty sad situation overall and every solution has severe tradeoffs. Personally, I just write CUDA when I can get away with it and try to stick to OpenCL otherwise, but everyone needs to make that choice for their own set of tradeoffs.

pjmlp|1 year ago

It is Intel, AMD and Google's fault for never supporting OpenCL as they should, and Khronos for pissing off Apple with how they took ownership of OpenCL.