top | item 45141769

Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul

23 points| robertvc | 5 months ago |modular.com

13 comments

order

saagarjha|5 months ago

Is anyone using Modular? Curious how you find it compares against the competitors in this space.

totalperspectiv|5 months ago

I have used Mojo quite a bit. It’s fantastic and lives up to every claim it makes. When the compiler becomes open source I fully expect it to really start taking off for data science.

Modular also has its paid platform for serving models called Max. I’ve not used that but heard good things.

subharmonicon|5 months ago

I’ve also been curious to see actual users compare/contrast their experiences with other options, but so far haven’t seen that.

There seem to be enthusiasts who have experimented a bit and like what they see but I haven’t seen much else.

subharmonicon|5 months ago

TLDR: In order to get good performance you need to use vendor-specific extensions that result in the same lock-in Modular has been claiming they will enable you to avoid.

imtringued|5 months ago

Correct. There is too much architectural divergence between GPU vendors. If they really wanted to avoid vendor specific extensions in user level code, they would have gone with something that could be said to be loosely inspired by tiny grad (which isn't ready yet).

Basically, you need a good description of the hardware and the compiler automatically generates the state of the art GEMM kernel.

Maybe it's 20% worse than Nvidia's hand written kernels, but you can switch hardware vendors or build arbitrary fused kernels at will.

totalperspectiv|5 months ago

I don’t follow your logic. Mojo can target multiple gpu vendors. What is the Modular specific lock in?