It isn't that good for local LLM inferencing. It's not designed to be as such.
It's designed to be a local dev machine for Nvidia server products. It has the same software and hardware stack as enterprise Nvidia hardware. That's what it is designed for.
Wait for M5 series Macs for good value local inferencing. I think the M5 Pro/Max are going to be very good values.
Given that most of Nvidia's enterprise software products are all single server designed to run on DGX boxes, like NIMs, this makes sense.
I am still amazed at how many companies buy a ton of DGX boxes and then are surprised that Nvidia does not have any Kubernetes native platform for training and inferencing across all the DGX machines. The Run.ai acquisition did not change anything, as you leave all the work to the user to integrate with distributed training frameworks like Ray or scalable inference platforms, like KServe/vLLM.
This is insanely slow given its 200+GB/s memory bandwidth. As a comparison, I've tested GPT OSS 120B on Strix Halo and it obtains 420tps prefill and >40tps decode.
Probably the quants have higher perplexity, but the Sparks performance seems to be lack lustre. The reviewer videos I've seen so far tries their best not to offend Nvidia or, rather, not break their contracts.
You guys that continue to compare DGX Spark to the Mac Studios, please remember two things:
1. Virtually every model that you'd run was developed on Nvidia gear and will run on Spark.
2. Spark has fast-as-hell interconnects. The sort of interconnects that one would want to use in an actual AI DC, so you can use more than one Spark at the same time, and RDMA, and actually start to figure out how things work the way they do and why. You can do a lot with 200 Gb of interconnect.
Also remember that the Mx Ultras have 2-3x the memory bandwidth. Looking at the benchmarks even Strix Halo seems to beat the Spark. Buying a 200 Gbps switch is $10k-$100k+ so don't imagine anyone actually will use the interconnect. The logical thing for Nvidia would be to sell a kit with three machines and cabling, and make it a ring with the dual ports per machine. Helps for some scenarios but not others with the 10 times slower network than memory bandwidth.
They're in a different ballback in memory bandwidth. The right comparison is the Ryzen AI Max 395 with 128GB DDR5-8000 which can be bought for around $1800 / 1750€.
$4,000 is actually extremely competitive. Even for an at-home enthusiast setup this price is not our of reach. I was expecting something far higher, that said, nVidia's MSRP is something of a pipe dream recently so we'll see when it's actually released and the availability. Curious also to see how they may scale together.
Well, that’s disappointing since the Mac Studio 128GB is $3,499. If Apple happens to launch a Mac Mini with 128GB RAM it would eat Nvidia Sparks’ lunch every day.
I wonder why they didn't test against the broadly available Strix Halo with 128GB of 256 GB/s memory bandwidth, 16 core full-fat Zen5 with AVX512 at $2k... it is a mystery...
Strix Halo has the problem that prefill is incredibly slow if your context is not very small.
The only thing that might be interesting about this DGX Spark is it's prefill manages to be faster due to better compute. I haven't compared the numbers yet, but they are included in the article.
That memory bandwidth choked out their performance. How can you claim 1000 tflops if it's not capable of delivering it. Seems they chose to sandbag the spark in favour of the rtx pro 6000.
I guess my next one I'm looking out for is the Orange Pi AI studio pro. Should have 192gb of ram, so able to run qwen3 235b, even though it's ddr4, it's nearly double the bandwidth of the spark.
Good luck with any kind of coherent ecosystem and support. Also, if you're in the U.S., there is a good chance you'll get hit with tariffs which would wipe out any potential value. I'd much rather stick with nVidia that has an ecosystem (even Apple for that matter), than touch a system like this off of Alibaba.
I think my 2001 MBP M1 Pro is ~200GB/s memory bandwidth, but it handles qwen3:32b quite nicely, albeit maxed out at ~70W.
I somehow expected the Spark to be the 'God in a Box' moment for local AI, but it feels like they went for trying to sell multiple units instead.
I'd be more tempted by a 2nd hand 128GB M2 ultra at ~800GB/s but the prices here are still high, and I'm not sure the Spark is going to convince people to part with those, unless we see some M5 glutenous RAM boxes soon. An easy way for Apple to catch up again.
All three (GB10, GB200 and GB300) are part of the Blackwell family, which means they have Compute Capability >= 10.X. You could potentially develop kernels to optimize MoE inference (given the large available unified memory, 128Gb, it makes the most sense to me) with CUDA >= 12.9 then ship the fatbins to the "big boys". As many people have pointed out across the thread, the spark doesn't really has the best perf/$, it's rather a small portable platform for experimentation and development
aurareturn|4 months ago
It's designed to be a local dev machine for Nvidia server products. It has the same software and hardware stack as enterprise Nvidia hardware. That's what it is designed for.
Wait for M5 series Macs for good value local inferencing. I think the M5 Pro/Max are going to be very good values.
jamesblonde|4 months ago
I am still amazed at how many companies buy a ton of DGX boxes and then are surprised that Nvidia does not have any Kubernetes native platform for training and inferencing across all the DGX machines. The Run.ai acquisition did not change anything, as you leave all the work to the user to integrate with distributed training frameworks like Ray or scalable inference platforms, like KServe/vLLM.
teleforce|4 months ago
[1] (Updated) NVIDIA Jetson AGX Thor Developer Kit to Launch in Mid-August with 2070 TFLOPS AI Performance, Priced at $3499:
https://linuxgizmos.com/updated-nvidia-jetson-agx-thor-devel...
[2] AAEON Announces BOXER-8741AI with NVIDIA Jetson Thor T5000 Module:
https://linuxgizmos.com/aaeon-announces-boxer-8741ai-with-nv...
spaceywilly|4 months ago
arresin|4 months ago
kirillzubovsky|4 months ago
NaomiLehman|4 months ago
limoce|4 months ago
This is insanely slow given its 200+GB/s memory bandwidth. As a comparison, I've tested GPT OSS 120B on Strix Halo and it obtains 420tps prefill and >40tps decode.
nialse|4 months ago
hank808|4 months ago
1. Virtually every model that you'd run was developed on Nvidia gear and will run on Spark. 2. Spark has fast-as-hell interconnects. The sort of interconnects that one would want to use in an actual AI DC, so you can use more than one Spark at the same time, and RDMA, and actually start to figure out how things work the way they do and why. You can do a lot with 200 Gb of interconnect.
nialse|4 months ago
m00x|4 months ago
pavlov|4 months ago
SethTro|4 months ago
Tepix|4 months ago
bilekas|4 months ago
EnPissant|4 months ago
CamperBob2|4 months ago
nialse|4 months ago
pixelpoet|4 months ago
EnPissant|4 months ago
The only thing that might be interesting about this DGX Spark is it's prefill manages to be faster due to better compute. I haven't compared the numbers yet, but they are included in the article.
yvbbrjdr|4 months ago
pixelpoet|4 months ago
tl;dr it gets absolutely smashed by Strix Halo, at half the price.
mortsnort|4 months ago
unknown|4 months ago
[deleted]
incomingpain|4 months ago
I guess my next one I'm looking out for is the Orange Pi AI studio pro. Should have 192gb of ram, so able to run qwen3 235b, even though it's ddr4, it's nearly double the bandwidth of the spark.
realityenigma|4 months ago
cindyllm|4 months ago
[deleted]
whitehexagon|4 months ago
I somehow expected the Spark to be the 'God in a Box' moment for local AI, but it feels like they went for trying to sell multiple units instead.
I'd be more tempted by a 2nd hand 128GB M2 ultra at ~800GB/s but the prices here are still high, and I'm not sure the Spark is going to convince people to part with those, unless we see some M5 glutenous RAM boxes soon. An easy way for Apple to catch up again.
unknown|4 months ago
[deleted]
andrewgleave|4 months ago
It would be interesting to swap out Ollama for LM Studio and use their built-in MLX support and see the difference.
altspace|4 months ago
ta12653421|4 months ago
a) what is the noise level? In that small box, it should be immense?
b) how many frames do we get in Q3A at max. resolution and will it be able to run Crysis? ;-) LOL (SCNR)
jerlam|4 months ago
harias|4 months ago
themgt|4 months ago
OliverGuy|4 months ago
Could I write code that runs on Spark and effortlessly run it on a big GB300 system with no code changes?
egeres|4 months ago
xs83|4 months ago
incomingpain|4 months ago
DGX Spark
pp - 1723.07/s
tg - 38.55/s
Ryzen AI Max+ 395
pp - 711.67/s
tg - 40.25/s
Is it worth the money?
richardczl|4 months ago
mwilcox|4 months ago
ionwake|4 months ago
khurdula|4 months ago
andrewstuart|4 months ago
No doubt that’s present here too somehow.
Gotta cut off something important so you’ll spend more on the next more expensive product.