(no title)
reasonableklout | 4 months ago
What's interesting is that the MGPU team has achieved SOTA Blackwell GEMM performance before Triton (which IIUC is trying to bring up Gluon to reach the same level). All the big players are coming up with their own block-based low-level-ish DSLs for CUDA: OpenAI, NVIDIA, and now Google.
flakiness|4 months ago
saagarjha|4 months ago