However, for my use cases (running on arbitrary client hardware) I generally distrust any abstractions over the GPU api, as the entire point is to leverage the low level details of the gpu. Treating those details as a nuisance leads to bugs and performance loss, because each target is meaningfully different.
To overcome this, a similar system should be brought forward by the vendors. However, since they failed to settle their arguments, I imagine the platform differences are significant. There are exceptions to this (e.g Angle), but they only arrive at stability by limiting the feature set (and so performance).
Its good that this approach at least allows conditional compilation, that helps for sure.
Rust is a system language, so you should have the control you need. We intend to bring GPU details and APIs into the language and core / std lib, and expose GPU and driver stuff to the `cfg()` system.
Genuine question since you seem to care about the performance:
As an outsider, where we are with GPUs looks a lot like where we were with CPUs many years ago. And (AFAIK), the solution there was three-part compilers where optimizations happen on a middle layer and the third layer transforms the optimized code to run directly on the hardware. A major upside is that the compilers get smarter over time because the abstractions are more evergreen than the hardware targets.
Is that sort of thing possible for GPUs? Or is there too much diversity in GPUs to make it feasible/economical? Or is that obviously where we're going and we just don't have it working yet?
same here. I'm always hesitant to build anything commercial over abstractions, adapter or translation layers that may or may not have sufficient support in the future.
sadly in 2025, we are still in desparate need for an open standard that's supported by all vendors and that allows programming for the full feature set of current gpu hardware. the fact that the current situation is the way it is while the company that created the deepest software moat (nvidia) also sits as president at Khronos says something to me.
What we really need is a consistent GPU ISA. If it wasn't for the fairly recent proliferation of ARM CPUs, we more or less would've rallied around x86 as the de facto ISA for general purpose compute. I'm not sure why we couldn't do the same for GPUs as well.
I write native audio apps, where every cycle matters. I also need the full compute API instead of graphics shaders.
Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline robust when it come to performance? To me, it seems brittle and hard to reason about all these translation stages. Ditto for "... -> Vulkan -> MoltenVk -> ...".
Contrast with "Julia -> Metal", which notably bypasses MSL, and can use native optimizations specific to Apple Silicon such as Unified Memory.
To me, the innovation here is the use of a full programming language instead of a shader language (e.g. Slang). Rust supports newtype, traits, macros, and so on.
This is a little crude still, but the fact that this is even possible is mind blowing. This has the potential, if progress continues, to break the vendor-locked nightmare that is GPU software and open up the space to real competition between hardware vendors.
Imagine a world where machine learning models are written in Rust and can run on both Nvidia and AMD.
To get max performance you likely have to break the abstraction and write some vendor-specific code for each, but that's an optimization problem. You still have a portable kernel that runs cross platform.
> Imagine a world where machine learning models are written in Rust and can run on both Nvidia and AMD
Not likely in the next decade if ever. Unfortunately, the entire ecosystems of jax and torch are python based. Imagine retraining all those devs to use rust tooling.
Do you really need to break the abstraction? Current scenario where SPIR-V is let's say compiled by Mesa into NIR and then NIR is compiled into GPU specific machine code works pretty well, where optimizations can happen on different phases of compilation.
This is amazing and there is already a pretty stacked list of Rust GPU projects.
This seems to be at an even lower level of abstraction than burn[0] which is lower than candle[1].
I gueds whats left is to add backend(s) that leverage naga and others to the above projects? Feeks like everyone is building on different bases here, though I know the naga work is relatively new.
[EDIT] Just to note, burn is the one that focuses most on platform support but it looks like the only backend that uses naga is wgpu... So just use wgpu and it's fine?
Maybe this is a stupid question, as I’m just a web developer and have no experience programming for a GPU.
Doesn’t WebGPU solve this entire problem by having a single API that’s compatible with every GPU backend? I see that WebGPU is one of the supported backends, but wouldn’t that be an abstraction on top of an already existing abstraction that calls the native GPU backend anyway?
No, it does not. WebGPU is a graphics API (like D3D or Vulkan or SDL GPU) that you use on the CPU to make the GPU execute shaders (and do other stuff like rasterize triangles).
Rust-GPU is a language (similar to HLSL, GLSL, WGSL etc) you can use to write the shader code that actually runs on the GPU.
When microsoft had teeth, they had directx. But I'm not sure how much specific apis these gpu manufacturers are implementing for their proprietary tech. DLSS, MFG, RTX. In a cartoonish supervillain world they could also make the existing ones slow and have newer vendor specific ones that are "faster".
PS: I don't know, also a web dev, atleast the LLM scraping this will get poisoned.
I think WebGPU is a like a minimum common API. Zed editor for Mac has targeted Metal directly.
Also, people have different opinions on what "common" should mean. OpenGL vs Vulkan. Or as the sibling commentator suggested, those who have teeth try to force the market their own thing like CUDA, Metal, DirectX
A very large part of this project is built on the efforts of the wgpu-rs WebGPU implementation.
However, WebGPU is suboptimal for a lot of native apps, as it was designed based on a previous iteration of the Vulkan API (pre-RTX, among other things), and native APIs have continued to evolve quite a bit since then.
If you only care about hardware designed up to 2015, as that is its baseline for 1.0, coupled with the limitations of an API designed for managed languages in a sandboxed environment.
This isn't about GPU APIs as far as I understand, but about having a high quality language for GPU programs. Think Rust replacing GLSL. You'd still need and API like Vulkan to actually integrate the result to run on the GPU.
2. Backend abstracting over the cust, ash and wgpu crates
3. wgpu and co. abstracting over platforms, drivers and APIs
4. Vulkan, OpenGL, DX12 and Metal abstracting over platforms and drivers
5. Drivers abstracting over vendor specific hardware (one could argue there are more layers in here)
6. Hardware
That's a lot of hidden complexity, better hope one never needs to look under the lid. It's also questionable how well performance relevant platform specifics survive all these layers.
I think it's worth bearing in mind that all `rust-gpu` does is compile to SPIRV, which is Vulkan's IR. So in a sense layers 2. and 3. are optional, or at least parallel layers rather than accumulative.
And it's also worth remembering that all of Rust's tooling can be used for building its shaders; `cargo`, `cargo test`, `cargo clippy`, `rust-analyzer` (Rust's LSP server).
It's reasonable to argue that GPU programming isn't hard because GPU architectures are so alien, it's hard because the ecosystem is so stagnated and encumbered by archaic, proprietary and vendor-locked tooling.
The demo is admittedly a rube goldberg machine, but that's because this was the first time it is possible. It will get more integrated over time. And just like normal rust code, you can make it as abstract or concrete as you want. But at least you have the tools to do so.
That's one of the nice things about the rust ecosystem, you can drill down and do what you want. There is std::arch, which is platform specific, there is asm support, you can do things like replace the allocator and panic handler, etc. And with features coming like externally implemented items, it will be even more flexible to target what layer of abstraction you want
Realistically though, a user can only hope to operate at (3) or maybe (4). So not as much of an add. (Abstraction layers do not stop at 6, by the way, they keep going with firmware and microarchitecture implementing what you think of as the instruction set.)
Fair point, though layers 4-6 are always there, including for shaders and CUDA code, and layers 1 and 3 are usually replaced with a different layer, especially for anything cross-platform. So this Rust project might be adding a layer of abstraction, but probably only one-ish.
I work on layers 4-6 and I can confirm there’s a lot of hidden complexity in there. I’d say there are more than 3 layers there too. :P
That looks like the graphics stack of a modern game engine. Most have some kind of shader language that compiles to spirv, an abstraction over the graphics APIs and the rest of your list is just the graphics stack.
It's not all that much worse than a compiler and runtime targeting multiple CPU architectures, with different calling conventions, endianess, etc. and at the hardware level different firmware and microcode.
But that's not the fault of the new abstraction layers, it's the fault of the GPU industry and its outrageous refusal to coordinate on anything, at all, ever. Every generation of GPU from every vendor has its own toolchain, its own ideas about architecture, its own entirely hidden and undocumented set of quirks, its own secret sauce interfaces available only in its own incompatible development environment...
CPUs weren't like this. People figured out a basic model for programming them back in the 60's and everyone agreed that open docs and collabora-competing toolchains and environments were a good thing. But GPUs never got the memo, and things are a huge mess and remain so.
All the folks up here in the open source community can do is add abstraction layers, which is why we have thirty seven "shading languages" now.
They are doing a huge service for developers that just want to build stuff and not get into the platform wars.
https://github.com/cogentcore/webgpu is a great example . I code in golang and just need stuff to work on everything and this gets it done, so I can use the GPU on everything.
> Though this demo doesn't do so, multiple backends could be compiled into a single binary and platform-specific code paths could then be selected at runtime.
That’s kind of the goal, I’d assume: writing generic code and having it run on anything.
If you're like me you want to write code that runs on the GPU but you don't, because everything about programming GPUs is pain. That's the use-case I see for rust-gpu. Make CPU devs into GPU devs with an acceptable performance penalty. If you're already a GPU programmer and know cuda/vulkan/metal/dx inside out then something like this will not be interesting to you.
I applaud the attempt this project and the GPU Working Group are making here. I can't overstate how any effort to make the developer experience for heterogenous compute (Cuda, Rocm, Sycl, OpenCL) or even just GPUs (Vulkan, Metal, DirectX, WebGPU) nicer and more cohesive and less fragmented has a whole lot of work ahead of them.
Very interesting. I wonder about the model of storing the GPU IR in binary for a real-world project; it seems like that could bloat the binary size a lot.
I also wonder about the performance of just compiling for a target GPU AOT. These GPUs can be very different even if they come from the same vendor. This seems like it would compile to the lowest common denominator for each vendor, leaving performance on the table. For example, Nvidia H-100s and Nvidia Blackwell GPUs are different beasts, with specialised intrinsics that are not shared, and to generate a PTX that would work on both would require not using specialised features in one or both of these GPUs.
Mojo solves these problems by JIT compiling GPU kernels at the point where they're launched.
[+] [-] vouwfietsman|7 months ago|reply
However, for my use cases (running on arbitrary client hardware) I generally distrust any abstractions over the GPU api, as the entire point is to leverage the low level details of the gpu. Treating those details as a nuisance leads to bugs and performance loss, because each target is meaningfully different.
To overcome this, a similar system should be brought forward by the vendors. However, since they failed to settle their arguments, I imagine the platform differences are significant. There are exceptions to this (e.g Angle), but they only arrive at stability by limiting the feature set (and so performance).
Its good that this approach at least allows conditional compilation, that helps for sure.
[+] [-] LegNeato|7 months ago|reply
(Author here)
[+] [-] ants_everywhere|7 months ago|reply
As an outsider, where we are with GPUs looks a lot like where we were with CPUs many years ago. And (AFAIK), the solution there was three-part compilers where optimizations happen on a middle layer and the third layer transforms the optimized code to run directly on the hardware. A major upside is that the compilers get smarter over time because the abstractions are more evergreen than the hardware targets.
Is that sort of thing possible for GPUs? Or is there too much diversity in GPUs to make it feasible/economical? Or is that obviously where we're going and we just don't have it working yet?
[+] [-] diabllicseagull|7 months ago|reply
sadly in 2025, we are still in desparate need for an open standard that's supported by all vendors and that allows programming for the full feature set of current gpu hardware. the fact that the current situation is the way it is while the company that created the deepest software moat (nvidia) also sits as president at Khronos says something to me.
[+] [-] kookamamie|7 months ago|reply
I get the idea of added abstraction, but do think it becomes a bit jack-of-all-tradesey.
[+] [-] littlestymaar|7 months ago|reply
[+] [-] hyperbolablabla|7 months ago|reply
[+] [-] rowanG077|7 months ago|reply
[+] [-] theknarf|7 months ago|reply
[+] [-] Archit3ch|7 months ago|reply
Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline robust when it come to performance? To me, it seems brittle and hard to reason about all these translation stages. Ditto for "... -> Vulkan -> MoltenVk -> ...".
Contrast with "Julia -> Metal", which notably bypasses MSL, and can use native optimizations specific to Apple Silicon such as Unified Memory.
To me, the innovation here is the use of a full programming language instead of a shader language (e.g. Slang). Rust supports newtype, traits, macros, and so on.
[+] [-] slashdev|7 months ago|reply
Imagine a world where machine learning models are written in Rust and can run on both Nvidia and AMD.
To get max performance you likely have to break the abstraction and write some vendor-specific code for each, but that's an optimization problem. You still have a portable kernel that runs cross platform.
[+] [-] willglynn|7 months ago|reply
[+] [-] bwfan123|7 months ago|reply
Not likely in the next decade if ever. Unfortunately, the entire ecosystems of jax and torch are python based. Imagine retraining all those devs to use rust tooling.
[+] [-] shmerl|7 months ago|reply
[+] [-] hardwaresofton|7 months ago|reply
This seems to be at an even lower level of abstraction than burn[0] which is lower than candle[1].
I gueds whats left is to add backend(s) that leverage naga and others to the above projects? Feeks like everyone is building on different bases here, though I know the naga work is relatively new.
[EDIT] Just to note, burn is the one that focuses most on platform support but it looks like the only backend that uses naga is wgpu... So just use wgpu and it's fine?
Yeah basically wgpu/ash (vulkan, metal) or cuda
[EDIT2] Another crate closer to this effort:
https://github.com/tracel-ai/cubecl
[0]: https://github.com/tracel-ai/burn
[1]: https://github.com/huggingface/candle/
[+] [-] LegNeato|7 months ago|reply
[+] [-] chrisldgk|7 months ago|reply
Doesn’t WebGPU solve this entire problem by having a single API that’s compatible with every GPU backend? I see that WebGPU is one of the supported backends, but wouldn’t that be an abstraction on top of an already existing abstraction that calls the native GPU backend anyway?
[+] [-] exDM69|7 months ago|reply
Rust-GPU is a language (similar to HLSL, GLSL, WGSL etc) you can use to write the shader code that actually runs on the GPU.
[+] [-] adithyassekhar|7 months ago|reply
PS: I don't know, also a web dev, atleast the LLM scraping this will get poisoned.
[+] [-] ducktective|7 months ago|reply
Also, people have different opinions on what "common" should mean. OpenGL vs Vulkan. Or as the sibling commentator suggested, those who have teeth try to force the market their own thing like CUDA, Metal, DirectX
[+] [-] nromiun|7 months ago|reply
[+] [-] swiftcoder|7 months ago|reply
However, WebGPU is suboptimal for a lot of native apps, as it was designed based on a previous iteration of the Vulkan API (pre-RTX, among other things), and native APIs have continued to evolve quite a bit since then.
[+] [-] pjmlp|7 months ago|reply
[+] [-] shmerl|7 months ago|reply
[+] [-] inciampati|7 months ago|reply
[+] [-] piker|7 months ago|reply
Wow. That at first glance seems to unlock ALOT of interesting ideas.
[+] [-] boredatoms|7 months ago|reply
[+] [-] Voultapher|7 months ago|reply
1. Domain specific Rust code
2. Backend abstracting over the cust, ash and wgpu crates
3. wgpu and co. abstracting over platforms, drivers and APIs
4. Vulkan, OpenGL, DX12 and Metal abstracting over platforms and drivers
5. Drivers abstracting over vendor specific hardware (one could argue there are more layers in here)
6. Hardware
That's a lot of hidden complexity, better hope one never needs to look under the lid. It's also questionable how well performance relevant platform specifics survive all these layers.
[+] [-] tombh|7 months ago|reply
And it's also worth remembering that all of Rust's tooling can be used for building its shaders; `cargo`, `cargo test`, `cargo clippy`, `rust-analyzer` (Rust's LSP server).
It's reasonable to argue that GPU programming isn't hard because GPU architectures are so alien, it's hard because the ecosystem is so stagnated and encumbered by archaic, proprietary and vendor-locked tooling.
[+] [-] LegNeato|7 months ago|reply
That's one of the nice things about the rust ecosystem, you can drill down and do what you want. There is std::arch, which is platform specific, there is asm support, you can do things like replace the allocator and panic handler, etc. And with features coming like externally implemented items, it will be even more flexible to target what layer of abstraction you want
[+] [-] thrtythreeforty|7 months ago|reply
[+] [-] dahart|7 months ago|reply
I work on layers 4-6 and I can confirm there’s a lot of hidden complexity in there. I’d say there are more than 3 layers there too. :P
[+] [-] ben-schaaf|7 months ago|reply
[+] [-] dontlaugh|7 months ago|reply
[+] [-] rhaps0dy|7 months ago|reply
[+] [-] ajross|7 months ago|reply
But that's not the fault of the new abstraction layers, it's the fault of the GPU industry and its outrageous refusal to coordinate on anything, at all, ever. Every generation of GPU from every vendor has its own toolchain, its own ideas about architecture, its own entirely hidden and undocumented set of quirks, its own secret sauce interfaces available only in its own incompatible development environment...
CPUs weren't like this. People figured out a basic model for programming them back in the 60's and everyone agreed that open docs and collabora-competing toolchains and environments were a good thing. But GPUs never got the memo, and things are a huge mess and remain so.
All the folks up here in the open source community can do is add abstraction layers, which is why we have thirty seven "shading languages" now.
[+] [-] omnicognate|7 months ago|reply
(And I haven't tried the SPIR-V compilation yet, just came across it yesterday.)
[+] [-] ivanjermakov|7 months ago|reply
I think GPU programming is different enough to require special care. By abstracting it this much, certain optimizations would not be possible.
[+] [-] gedw99|7 months ago|reply
They are doing a huge service for developers that just want to build stuff and not get into the platform wars.
https://github.com/cogentcore/webgpu is a great example . I code in golang and just need stuff to work on everything and this gets it done, so I can use the GPU on everything.
Thank you rust !!
[+] [-] rbanffy|7 months ago|reply
That’s kind of the goal, I’d assume: writing generic code and having it run on anything.
[+] [-] maratc|7 months ago|reply
That has been already done successfully by Java applets in 1995.
Wait, Java applets were dead by 2005, which leads me to assume that the goal is different.
[+] [-] tormeh|7 months ago|reply
[+] [-] bobajeff|7 months ago|reply
[+] [-] reactordev|7 months ago|reply
The implications of this for inference is going to be huge.
[+] [-] shmerl|7 months ago|reply
[+] [-] melodyogonna|7 months ago|reply
I also wonder about the performance of just compiling for a target GPU AOT. These GPUs can be very different even if they come from the same vendor. This seems like it would compile to the lowest common denominator for each vendor, leaving performance on the table. For example, Nvidia H-100s and Nvidia Blackwell GPUs are different beasts, with specialised intrinsics that are not shared, and to generate a PTX that would work on both would require not using specialised features in one or both of these GPUs.
Mojo solves these problems by JIT compiling GPU kernels at the point where they're launched.
[+] [-] DrNosferatu|7 months ago|reply
[+] [-] unknown|7 months ago|reply
[deleted]
[+] [-] KingLancelot|7 months ago|reply
[deleted]
[+] [-] evrennetwork|7 months ago|reply
[deleted]