NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

It isn't that good for local LLM inferencing. It's not designed to be as such.

It's designed to be a local dev machine for Nvidia server products. It has the same software and hardware stack as enterprise Nvidia hardware. That's what it is designed for.

Wait for M5 series Macs for good value local inferencing. I think the M5 Pro/Max are going to be very good values.

jamesblonde|4 months ago

Given that most of Nvidia's enterprise software products are all single server designed to run on DGX boxes, like NIMs, this makes sense.

I am still amazed at how many companies buy a ton of DGX boxes and then are surprised that Nvidia does not have any Kubernetes native platform for training and inferencing across all the DGX machines. The Run.ai acquisition did not change anything, as you leave all the work to the user to integrate with distributed training frameworks like Ray or scalable inference platforms, like KServe/vLLM.

teleforce|4 months ago

If I understand correctly the DGX is for the development, and the AGX Thor is more geared toward local LLM inferencing [1],[2].

[1] (Updated) NVIDIA Jetson AGX Thor Developer Kit to Launch in Mid-August with 2070 TFLOPS AI Performance, Priced at $3499:

https://linuxgizmos.com/updated-nvidia-jetson-agx-thor-devel...

[2] AAEON Announces BOXER-8741AI with NVIDIA Jetson Thor T5000 Module:

https://linuxgizmos.com/aaeon-announces-boxer-8741ai-with-nv...

spaceywilly|4 months ago

What is the value proposition for buying one of these vs renting time on similar hardware from a cloud provider?

arresin|4 months ago

I wish I could run Linux on them (the m5)

kirillzubovsky|4 months ago

Fascinating that we didn't have to wait too long. Apple announced M5 this morning. Does it compare though?

NaomiLehman|4 months ago

because of possible hardware-accelerated matmul in GPU cores?

limoce|4 months ago

> ollama gpt-oss 120b mxfp4 1 94.67 11.66

This is insanely slow given its 200+GB/s memory bandwidth. As a comparison, I've tested GPT OSS 120B on Strix Halo and it obtains 420tps prefill and >40tps decode.

nialse|4 months ago

Probably the quants have higher perplexity, but the Sparks performance seems to be lack lustre. The reviewer videos I've seen so far tries their best not to offend Nvidia or, rather, not break their contracts.

hank808|4 months ago

You guys that continue to compare DGX Spark to the Mac Studios, please remember two things:

1. Virtually every model that you'd run was developed on Nvidia gear and will run on Spark. 2. Spark has fast-as-hell interconnects. The sort of interconnects that one would want to use in an actual AI DC, so you can use more than one Spark at the same time, and RDMA, and actually start to figure out how things work the way they do and why. You can do a lot with 200 Gb of interconnect.

nialse|4 months ago

Also remember that the Mx Ultras have 2-3x the memory bandwidth. Looking at the benchmarks even Strix Halo seems to beat the Spark. Buying a 200 Gbps switch is $10k-$100k+ so don't imagine anyone actually will use the interconnect. The logical thing for Nvidia would be to sell a kit with three machines and cabling, and make it a ring with the dual ports per machine. Helps for some scenarios but not others with the 10 times slower network than memory bandwidth.

m00x|4 months ago

At best this is a cheap setup to test distributed training/inference code.

pavlov|4 months ago

It would be very interesting to read a tutorial on case 2.

SethTro|4 months ago

Article doesn't seem to mention price which is $4,000 which makes it comparable to a 5090 but with 128GB of unified LPDDR5x vs the 5090's 32GB DDR7.

Tepix|4 months ago

They're in a different ballback in memory bandwidth. The right comparison is the Ryzen AI Max 395 with 128GB DDR5-8000 which can be bought for around $1800 / 1750€.

bilekas|4 months ago

$4,000 is actually extremely competitive. Even for an at-home enthusiast setup this price is not our of reach. I was expecting something far higher, that said, nVidia's MSRP is something of a pipe dream recently so we'll see when it's actually released and the availability. Curious also to see how they may scale together.

EnPissant|4 months ago

A 5090 is $2000.

CamperBob2|4 months ago

And about 1/4 the memory bandwidth, which is what matters for inference.

nialse|4 months ago

Well, that’s disappointing since the Mac Studio 128GB is $3,499. If Apple happens to launch a Mac Mini with 128GB RAM it would eat Nvidia Sparks’ lunch every day.

pixelpoet|4 months ago

I wonder why they didn't test against the broadly available Strix Halo with 128GB of 256 GB/s memory bandwidth, 16 core full-fat Zen5 with AVX512 at $2k... it is a mystery...

EnPissant|4 months ago

Strix Halo has the problem that prefill is incredibly slow if your context is not very small.

The only thing that might be interesting about this DGX Spark is it's prefill manages to be faster due to better compute. I haven't compared the numbers yet, but they are included in the article.

yvbbrjdr|4 months ago

Hi, author here. I crowd-sourced the devices for benchmarking from my friends. It just happened that none of my friend has this device.

pixelpoet|4 months ago

There are some benches on reddit: https://old.reddit.com/r/LocalLLaMA/comments/1o6163l/dgx_spa...

tl;dr it gets absolutely smashed by Strix Halo, at half the price.

mortsnort|4 months ago

The strix halo can also be used as a capable gaming/dev Pac with your OS of choice.

unknown|4 months ago

[deleted]

incomingpain|4 months ago

That memory bandwidth choked out their performance. How can you claim 1000 tflops if it's not capable of delivering it. Seems they chose to sandbag the spark in favour of the rtx pro 6000.

I guess my next one I'm looking out for is the Orange Pi AI studio pro. Should have 192gb of ram, so able to run qwen3 235b, even though it's ddr4, it's nearly double the bandwidth of the spark.

realityenigma|4 months ago

Good luck with any kind of coherent ecosystem and support. Also, if you're in the U.S., there is a good chance you'll get hit with tariffs which would wipe out any potential value. I'd much rather stick with nVidia that has an ecosystem (even Apple for that matter), than touch a system like this off of Alibaba.

cindyllm|4 months ago

[deleted]

whitehexagon|4 months ago

I think my 2001 MBP M1 Pro is ~200GB/s memory bandwidth, but it handles qwen3:32b quite nicely, albeit maxed out at ~70W.

I somehow expected the Spark to be the 'God in a Box' moment for local AI, but it feels like they went for trying to sell multiple units instead.

I'd be more tempted by a 2nd hand 128GB M2 ultra at ~800GB/s but the prices here are still high, and I'm not sure the Spark is going to convince people to part with those, unless we see some M5 glutenous RAM boxes soon. An easy way for Apple to catch up again.

unknown|4 months ago

[deleted]

andrewgleave|4 months ago

Looks like MLX is not a supported backend in Ollama so the numbers for the Mac could be significantly higher in some cases.

It would be interesting to swap out Ollama for LM Studio and use their built-in MLX support and see the difference.

altspace|4 months ago

Any views on how Isaac Lab / Isaac Sim would perform on DGX Spark?

ta12653421|4 months ago

Two questions:

a) what is the noise level? In that small box, it should be immense?

b) how many frames do we get in Q3A at max. resolution and will it be able to run Crysis? ;-) LOL (SCNR)

jerlam|4 months ago

"Metal foam" sounds cool but it just looks like a steel wool pad you would use for cleaning dishes.

harias|4 months ago

Helps with the cooling is my guess. Increased surface area

themgt|4 months ago

M5 Macs may be launching as early as today. Inference should see a significant boost w/ matmul acceleration.

OliverGuy|4 months ago

How representative is this platform of the bigger GB200 and GB300 chips?

Could I write code that runs on Spark and effortlessly run it on a big GB300 system with no code changes?

egeres|4 months ago

All three (GB10, GB200 and GB300) are part of the Blackwell family, which means they have Compute Capability >= 10.X. You could potentially develop kernels to optimize MoE inference (given the large available unified memory, 128Gb, it makes the most sense to me) with CUDA >= 12.9 then ship the fatbins to the "big boys". As many people have pointed out across the thread, the spark doesn't really has the best perf/$, it's rather a small portable platform for experimentation and development

xs83|4 months ago

If you mean CUDA specific then yes. The biggest benefit of these machines over the others is the CUDA ecosystem and tools like cuDF, cuGraph etc

incomingpain|4 months ago

GPT 120B is your goto model:

DGX Spark

pp - 1723.07/s

tg - 38.55/s

Ryzen AI Max+ 395

pp - 711.67/s

tg - 40.25/s

Is it worth the money?

richardczl|4 months ago

Memory indeed would be an issue

mwilcox|4 months ago

Just get 5 Mac minis.

ionwake|4 months ago

would it perform better?

khurdula|4 months ago

Bruh, if it were priced at like $2,499 it would make sense, but this is just too much.

andrewstuart|4 months ago

Nvidia always short changes its own products and stunts them in some way.

No doubt that’s present here too somehow.

Gotta cut off something important so you’ll spend more on the next more expensive product.

93 comments