Exactly. CUDA is huge moat and all competitors must be adopting SOFTWARE first approach similar to what tinycorp is trying to do.
Find one single thing that makes CUDA bad to use and TRIPLE DOWN on that.
Why doesn't AMD make a similar framework than CUDA? Is this so much of a task? But if that increases their market share that should be financially viable, no?
ROCm is their CUDA-like and imo it's been a buggy mess, and I'm talking bugs that make your entire system lock up until you hard reboot. Same with their media encoders. Vulkan compute is starting to recieve support by stuff like llama.cpp and ollama and I've had way better luck with that on non-nvidia hardware. Probably for the best that we have a single cross-vendor standard for this.
Intel focused on SyCL which not many people seem to actually care about. It looks far enough removed from CUDA you’d have to think hard about porting things as well. From what I understand ROCm looks very close to CUDA.
It's also complicated by the notion that raster performance doesn't directly translate to tensor performance. Apple and AMD both make excellent raster GPUs, but still lose in efficiency to the CUDA's architecture in rendering and compute.
I'd really like AMD and Apple to start from scratch with a compute-oriented GPU architecture, ideally standardized with Khronos. The NPU/tensor coprocessor architecture has already proven itself to be a bad idea.
That may be true, but assuming you meant "within 30% of the performance" ... can we just acknowledge that is a rather significant handicap, even ignoring CUDA.
The customers are players that can throw money into the software stack, hell, they are even throwing lots of money in the hardware one too with proprietary tensors and such.
And the big players don't necessarily care about the full software stack, they are likely to optimize the hardware for single usage (e.g. inference or specific steps of the training).
I which no one cares about. As a 1% player having a convoluted C++ centric stack when the 99% player has something different e ouch porting requires critical thinking means no one gives a damn about it.
ZLUDA has more interest that SyCL and that should say it all right there.
yolostar1|25 days ago
Qision|25 days ago
4fterd4rk|25 days ago
PrivateButts|25 days ago
hipster001|25 days ago
bfrog|25 days ago
bhouston|25 days ago
(Replaced "with 30%" with "within 30%")
bigyabai|25 days ago
I'd really like AMD and Apple to start from scratch with a compute-oriented GPU architecture, ideally standardized with Khronos. The NPU/tensor coprocessor architecture has already proven itself to be a bad idea.
roysting|25 days ago
epolanski|25 days ago
And the big players don't necessarily care about the full software stack, they are likely to optimize the hardware for single usage (e.g. inference or specific steps of the training).
nszceta|25 days ago
bfrog|25 days ago
ZLUDA has more interest that SyCL and that should say it all right there.