(no title)
davidatbu | 5 months ago
> Whether it's 1, 2, or N kernels is irrelevant.
Not sure what you mean here. But new kernels are written all the time (flash-attn is a great example). One can't do that in plain Python. E.g., flash-attn was originally written in C++ CUDA, and now in Triton.
bjourne|5 months ago
What I mean here is that DNN code is written on a much higher level than kernels. They are just building blocks you use to instantiate your dataflow.