top | item 45050556

(no title)

artemisart | 6 months ago

Ok then just to clarify: you can fit 4x larger models on the Spark vs 5090, not 17x.

discuss

ilirium|6 months ago

@nabla9 have tried to tell you that for DGX Spark, you can also use optimized models; therefore, this means that Spark can also be used for inference with bigger models, such as those exceeding 200B.

Please compare the same things: carrots VS carrots, not apples VS eggs.

artemisart|6 months ago

I don't understand what's not optimized on 5090. If we're comparing with Apple chips or AMD Strix Halo yes you will have very different hardware + software support, no FP4 etc. but here everything is CUDA, Blackwell vs Blackwell, same FP4 structured sparsity, so I don't get how it would be honest to compare a quantized FP4 model on Spark with an unoptimized FP16 model on a 5090 ?

hnuser123456|6 months ago

You and nabla9 are both the one comparing apples and eggs. 4x more RAM means 4x larger models when everything else is held the same to make a fair comparison.