@nabla9 have tried to tell you that for DGX Spark, you can also use optimized models; therefore, this means that Spark can also be used for inference with bigger models, such as those exceeding 200B.
Please compare the same things: carrots VS carrots, not apples VS eggs.
I don't understand what's not optimized on 5090. If we're comparing with Apple chips or AMD Strix Halo yes you will have very different hardware + software support, no FP4 etc. but here everything is CUDA, Blackwell vs Blackwell, same FP4 structured sparsity, so I don't get how it would be honest to compare a quantized FP4 model on Spark with an unoptimized FP16 model on a 5090 ?
You and nabla9 are both the one comparing apples and eggs. 4x more RAM means 4x larger models when everything else is held the same to make a fair comparison.
ilirium|6 months ago
Please compare the same things: carrots VS carrots, not apples VS eggs.
artemisart|6 months ago
hnuser123456|6 months ago