CUDA Tile Open Sourced

Let's see if developers sleepwalk into another trap to keep us locked into nvidia's hardware for the next decade.

pjmlp|2 months ago

It is up to AMD, Intel and Khronos to offer APIs and tools that are actually nice to use.

They have had about 15 years to move beyond C99, stone age workflows to compile GLSL and C99 with their drivers, no libraries ecosystem, and printf debugging.

Eventually some of the issues have been fixed, after they started seeing only hardliners would put with such development experience, and then it was too late.

the__alchemist|2 months ago

IMO it's not Nvidia's fault the competing APIs are high friction.

OneDeuxTriSeiGo|2 months ago

CUDA Tile is an open source MLIR Dialect so it wouldn't take much to write MLIR transforms to map it from the Tile IR to TOSA or gpu + vector + some amdgpu or other specialty dialects.

The Tile dialect is pretty much independent of the nvidia ecosystem so all it takes is one good set of MLIR transform passes to run anything on the CUDA stack that compiles to tile out of the nvidia ecosystem prison.

So if anything this is actually a massive opportunity to escape vendor lock in if it catches on in the CUDA ecosystem.

trueismywork|2 months ago

TileIR is Apache licensed so AMD can implement it as well.

RicoElectrico|2 months ago

Obviously they will, as with the mainframe and cloud.

jauntywundrkind|2 months ago

Will be interesting to see if Nvidia and other have any interest & energy getting this used by others, if there actually is an ecosystem forming around it.

Google leading XLA & IREE, with awesome intermediate representations, used by lots of hardware platforms, and backing really excellent Jax & Pytorch implementations, having tools for layout & optinization folks can share: they really build an amazing community.

There's still so much room for planning/scheduling, so much hardware we have yet to target. RISC-V has really interesting vector instructions, for example, and it seems like there's so much exploration / work to do to better leverage that.

Nvidia has partners everywhere now. Nvlink is used by Intel, AWS Tritanium, others. Yesterday the Groq exclusive license that Nvidia paid to give to Groq?! Seeing how and when CUDA Tiles emerges: will be interesting. Moving from fabric partnerships, up up up the stack.

pjmlp|2 months ago

For NVidia it suffices this is a Python JIT allowing programming CUDA compute kernels directly in Python instead of C++, yet another way how Intel and AMD, alongside Khronos APIs, lag behind in great developer experiences for GPU compute programming.

Ah, and Nsight debugging also supports Python CUDA Tiles debugging.

https://developer.nvidia.com/blog/simplify-gpu-programming-w...

Moosdijk|2 months ago

> There's still so much room for planning/scheduling, so much hardware we have yet to target

this is nicely illustrated by this recent article:

https://news.ycombinator.com/item?id=46366998

turtletontine|2 months ago

On the RISC-V vector instructions, could you elaborate? Are the vector extensions substantially different from those in ARM or x86?

nl|2 months ago

> Groq exclusive license

non-exclusive license actually.

almostgotcaught|2 months ago

> Google leading XLA & IREE

IREE hasn't been at G for >2 years.

opan|2 months ago

>The CUDA Tile IR project is under the Apache License v2.0 with LLVM Exceptions

gaogao|2 months ago

The compiler for CUDA Tile being Blackwell only is a baffling decision. I wanted to try it out, but it's only really easy to grab H100s quickly right now. I guess maybe I'll try it out on my 5070 Ti after traveling, but am more likely to stick to an IR that targets multiple platforms, since they couldn't be bothered.

robobsolete|2 months ago

I was keen to try it too, but oh well

unknown|2 months ago

[deleted]

boywitharupee|2 months ago

shouldn't the title be "CUDA Tile IR Open Sourced"?

OneDeuxTriSeiGo|2 months ago

It's more or less the same thing. CUDA TIle is the name of the IR, cuTile is the name of the high level DSLs.

pyuser583|2 months ago

I’m glad CUDA and “open source” are in the same sentence again.

We’d all prefer cross platform programming, but if you’re going to do platform specific, I prefer open source to closed source.

Thank you NVIDIA!

0-_-0|2 months ago

This is basically the nvidia equivalent of cooperative_matrix_2 in Vulkan which is vendor agnostic and should get much more hype that it's getting.

pjmlp|2 months ago

Maybe Vulkan could provide native support for Python, C++20, and a graphical debugging experience.

It is surely not equivalent as of today.

xmorse|2 months ago

Writing this in Mojo would have been so much easier

3abiton|2 months ago

It's barely gaining adoption though. The lack of buzz is a chicken and egg issue for Mojo. I fiddled shortly with it (mainly to get it working some of my pythong scripts), and it was suprisingly easy. It'll shoot up one day for sure if Latner doesn't give up early on it.

ipsum2|2 months ago

Mojo is not open source and would not get close to the performance of cuTile.

I'm tired of people shilling things they don't understand.

bigyabai|2 months ago

Use-cases like this are why Mojo isn't used in production, ever. What does Nvidia gain from switching to a proprietary frontend for a compiler backend they're already using? It's a legal headache.

Second-rate libraries like OpenCL had industry buy-in because they were open. They went through standards committees and cooperated with the rest of the industry (even Nvidia) to hear-out everyone's needs. Lattner gave up on appealing to that crowd the moment he told Khronos to pound sand. Nobody should be wondering why Apple or Nvidia won't touch Mojo with a thirty-nine and a half foot pole.

llmslave2|2 months ago

I really want Mojo to take off. Maybe in a few years. The lack of an stdlib holds it back more than they think, and since their focus is narrow atm it's not useful for the vast majority of work.

pjmlp|2 months ago

It would help if they were not so much macOS and Linux focused.

Julia, Python GPU JITs work great on Windows, and many people only get Windows systems as default at work.

CamperBob2|2 months ago

Fun game: see how many clicks it takes you to learn what MLIR stands for.

I lost count at five or six. Define your acronyms on first use, people.

ipnon|2 months ago

GPU programming definitely is not beginner friendly. There's a much higher learning curve than most open source projects. To learn basic Python you need to know about definitions and loops and variables, but to learn CUDA kernels you need to know maybe an order of magnitude more concepts to write anything useful. It's just not worth the time to cater to people who don't RTFM, the README would be twice as long and be redundant to the target audience of the library.

saagarjha|2 months ago

This is a GitHub repo for compiler engineers.

fragmede|2 months ago

I did it in three. I selected it in your comment, and then had to hit "more" to get to the menu to ask Google about it, which brought me to https://www.google.com/search?q=MLIR which says: MLIR is an open-source compiler infrastructure project developed as a sub-project of the LLVM project. Hopefully

Get better at computers and stop needing to be spoon-fed information, people!

rswail|2 months ago

Based on the use of LLVM I guessed "Machine Learning Intermediate Representation"?

How close was I?

roughly|2 months ago

The ol’ TMA problem.

piskov|2 months ago

If only there was a chat-based app that you could ask questions to.

toolboxg1x0|2 months ago

NVIDIA tensor core units, where the second column in kernel optimization is producing a test suite.

106 comments