I have used Mojo quite a bit. It’s fantastic and lives up to every claim it makes. When the compiler becomes open source I fully expect it to really start taking off for data science.
Modular also has its paid platform for serving models called Max. I’ve not used that but heard good things.
TLDR: In order to get good performance you need to use vendor-specific extensions that result in the same lock-in Modular has been claiming they will enable you to avoid.
Correct. There is too much architectural divergence between GPU vendors. If they really wanted to avoid vendor specific extensions in user level code, they would have gone with something that could be said to be loosely inspired by tiny grad (which isn't ready yet).
Basically, you need a good description of the hardware and the compiler automatically generates the state of the art GEMM kernel.
Maybe it's 20% worse than Nvidia's hand written kernels, but you can switch hardware vendors or build arbitrary fused kernels at will.
saagarjha|5 months ago
totalperspectiv|5 months ago
Modular also has its paid platform for serving models called Max. I’ve not used that but heard good things.
subharmonicon|5 months ago
There seem to be enthusiasts who have experimented a bit and like what they see but I haven’t seen much else.
subharmonicon|5 months ago
imtringued|5 months ago
Basically, you need a good description of the hardware and the compiler automatically generates the state of the art GEMM kernel.
Maybe it's 20% worse than Nvidia's hand written kernels, but you can switch hardware vendors or build arbitrary fused kernels at will.
totalperspectiv|5 months ago