All three (GB10, GB200 and GB300) are part of the Blackwell family, which means they have Compute Capability >= 10.X. You could potentially develop kernels to optimize MoE inference (given the large available unified memory, 128Gb, it makes the most sense to me) with CUDA >= 12.9 then ship the fatbins to the "big boys". As many people have pointed out across the thread, the spark doesn't really has the best perf/$, it's rather a small portable platform for experimentation and development
egeres|4 months ago
xs83|4 months ago