As someone who hot on early on the Ryzen AI 395+, are there any added value for the DGX Spark beside having cuda (compared to ROCm/vulkan)? I feel Nvidia fumbled the marketing, either making it sound like an inference miracle, or a dev toolkit (then again not enough to differentiate it from the superior AGX Thor).
I am curious about where you find its main value, and how would it fit within your tooling, and use cases compared to other hardware?
From the inference benchmarks I've seen, a M3 Ultra always come on top.
It's a webUI that'll let you try a bunch of different, super powerful things, including easily doing image and video generation in lots of different ways.
It was really useful to me when benching stuff at work on various gear. ie L4 vs A40 vs H100 vs 5th gen EPYC cpus, etc.
About what I expected. The Jetson series had the same issues, mostly, at a smaller scale: Deviate from the anointed versions of YOLO, and nothing runs without a lot of hacking. Being beholden to CUDA is both a blessing and a curse, but what I really fear is how long it will take for this to become an unsupported golden brick.
Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).
Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.
> Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory.
It's not comparable to 4090 inference speed. It's significantly slower, because of the lack of MXFP4 models out there. Even compared to Ryzen AI 395 (ROCm / Vulkan), on gpt-oss-120B mxfp4, somehow DGX manages to lose on token generation (pp is faster though.
> Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).
ROCm (v7) for APUs came a long way actually, mostly thanks to the community effort, it's quite competitive and more mature. It's still not totally user friendly, but it doesn't break between updates (I know the bar is low, but that was the status a year ago). So in comparison, the strix halo offers lots of value for your money if you need a cheap compact inference box.
Havn't tested finetuning / training yet, but in theory it's supported, not to forget that APU is extremely performany for "normal" tasks (threadripper level) compared to the CPU of the DGX Spark.
A few years ago I worked on an ARM supercomputer, as well as a POWER9 one. x86 is so assumed for anything other than trivial things that it is painful.
What I found was a good solution was using Spack:
https://spack.io/
That allows you to download/build the full toolchain of stuff you need for whatever architecture you are on - all dependencies, compilers (GCC, CUDA, MPI, etc.), compiled Python packages, etc. and if you need to add a new recipe for something it is really easy.
For the fellow Brits - you can tell this was named by Americans!!!
It's good that you've mentioned Spack but not for HPC work, and that's very interesting.
This a high level overview by one of the Spack authors from the HN post back in 2023 (top comment from 100 comments), including the Spack original paper link [1]:
At a very high level, Spack has:
* Nix's installation model and configuration hashing
* Homebrew-like packages, but in a more expressive Python DSL, and with more versions/options
* A very powerful dependency resolver that doesn't just pick from a set of available configurations -- it configures your build according to possible configurations.
You could think of it like Nix with dependency resolution, but with a nice Python DSL. There is more on the "concretizer" (resolver) and how we've used ASP for it here:
Depending on the kind of project and data agreements, it’s sometimes much easier to run computations on premise than in the cloud. Even though the cloud is somewhat more secure.
I for example have some healthcare research projects with personally identifiable data, and in these times it’s simpler for the users to trust my company, than my company and some overseas company and it’s associated government.
For me as an employee in Australia, I could buy this and write it off my tax as a work expense myself. To rent, it would be much more cumbersome, involving the company. That's 45% off (our top marginal tax rate).
Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?
There are several 70B+ models that are genuinely useful these days.
I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/
128gb unified memory is enough for pretty good models, but honestly for the price of this it is better just go go with a few 3090s or a Mac due to memory bandwidth limitations of this card
the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...
Despite the large video memory capacity, its video memory bandwidth is very low. I guess the model's decode speed will be very slow. Of course, this design is very well suited for the inference needs of MoE models.
How would this fare alongside the new Ryzen chips, ooi? From memory is seems to be getting the same amount of tok/s but would the Ryzen box be more useful for other computing, not just AI?
From reading reviews, dont have either yet: the nvidia actually has unified memory, AMD you have to specify the allocation split. Nvidia maybe has some form of gpu partitioning so you can run multiple smaller models but no one got it working yet. The Ryzen is very different from the pro gpus and the software support wont benefit from work done there, while nvidia is same. You can play games on Ryzen.
Is there like an affiliate link or something where I can just buy one? Nvidia’s site says sold out, PNY invites you to find a retailer, the other links from nvidia didn’t seem to go anywhere. Can one just click to buy it somewhere?
My local reseller has them in stock in the EU with a markup... Directly from Nvidia probably not for quite sometime I have some friends who put in preorders and they didn't get any from the first charge.
I’m kind of surprised at the issues everyone is having with the arm64 hardware. PyTorch has been building official wheels for several months already as people get on GH200s. Has the rest of the ecosystem not kept up?
Is ASUS Ascent GX10 and similar from Lenovo etc. 100% compatible with DGX Spark and can be chained together with the same functionality (i.e. ASUS together with Lenovo for 256GB inference)?
And yet CUDA has looked way better than ATi/AMD offerings in the same area despite ATi/AMD technically being first to deliver GPGPU (major difference is that CUDA arrived year later but supported everything from G80 up, and nicely evolved, while AMD managed to have multiple platforms with patchy support and total rewrites in between)
Except the performance people are seeing is way below expectations. It seems to be slower than an M4. Which kind of defeats the purpose. It was advertised as 1 Petaflop on your desk.
But maybe this will change? Software issues somehow?
I'm hopeful this makes Nvidia take aarch64 seriously for Jetson development. For the past several years Mac-based developers have had to run the flashing tools in unsupported ways, in virtual machines with strange QEMU options.
Its a DGX dev box, for those (not consumers) that will ultimately need to run their code on large DGX clusters where a failure or a ~3% slowdown of training ends up costing tens of thousands of dollars.
That's the use case, not running LLM efficiently, and you can't do that with a RTX5090.
simonw|4 months ago
I'm running VLLM on it now and it was as simple as:
(That recipe from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?v... )And then in the Docker container:
The default model it loads is Qwen/Qwen3-0.6B, which is tiny and fast to load.3abiton|4 months ago
I am curious about where you find its main value, and how would it fit within your tooling, and use cases compared to other hardware?
From the inference benchmarks I've seen, a M3 Ultra always come on top.
behnamoh|4 months ago
justinclift|4 months ago
Installation instructions: https://github.com/comfyanonymous/ComfyUI#nvidia
It's a webUI that'll let you try a bunch of different, super powerful things, including easily doing image and video generation in lots of different ways.
It was really useful to me when benching stuff at work on various gear. ie L4 vs A40 vs H100 vs 5th gen EPYC cpus, etc.
rcarmo|4 months ago
Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).
Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.
3abiton|4 months ago
It's not comparable to 4090 inference speed. It's significantly slower, because of the lack of MXFP4 models out there. Even compared to Ryzen AI 395 (ROCm / Vulkan), on gpt-oss-120B mxfp4, somehow DGX manages to lose on token generation (pp is faster though.
> Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).
ROCm (v7) for APUs came a long way actually, mostly thanks to the community effort, it's quite competitive and more mature. It's still not totally user friendly, but it doesn't break between updates (I know the bar is low, but that was the status a year ago). So in comparison, the strix halo offers lots of value for your money if you need a cheap compact inference box.
Havn't tested finetuning / training yet, but in theory it's supported, not to forget that APU is extremely performany for "normal" tasks (threadripper level) compared to the CPU of the DGX Spark.
EnPissant|4 months ago
I have no immediate numbers for prefill, but the memory bandwidth is ~4x greater on a 4090 which will lead to ~4x faster decode.
KeplerBoy|4 months ago
TiredOfLife|4 months ago
physicsguy|4 months ago
What I found was a good solution was using Spack: https://spack.io/ That allows you to download/build the full toolchain of stuff you need for whatever architecture you are on - all dependencies, compilers (GCC, CUDA, MPI, etc.), compiled Python packages, etc. and if you need to add a new recipe for something it is really easy.
For the fellow Brits - you can tell this was named by Americans!!!
teleforce|4 months ago
This a high level overview by one of the Spack authors from the HN post back in 2023 (top comment from 100 comments), including the Spack original paper link [1]:
At a very high level, Spack has:
* Nix's installation model and configuration hashing
* Homebrew-like packages, but in a more expressive Python DSL, and with more versions/options
* A very powerful dependency resolver that doesn't just pick from a set of available configurations -- it configures your build according to possible configurations.
You could think of it like Nix with dependency resolution, but with a nice Python DSL. There is more on the "concretizer" (resolver) and how we've used ASP for it here:
* "Using Answer Set Programming for HPC Dependency Solving", https://arxiv.org/abs/2210.08404
[1] Spack – scientific software package manager for supercomputers, Linux, and macOS (100 comments):
https://news.ycombinator.com/item?id=35237269
donw|4 months ago
theowaway|4 months ago
[deleted]
two_handfuls|4 months ago
speedgoose|4 months ago
I for example have some healthcare research projects with personally identifiable data, and in these times it’s simpler for the users to trust my company, than my company and some overseas company and it’s associated government.
killingtime74|4 months ago
smallnamespace|4 months ago
For inference decode the bandwidth is the main limitation so if running LLMs is your use case you should probably get a Mac instead.
dialogbox|4 months ago
ChocolateGod|4 months ago
fnordpiglet|4 months ago
simonw|4 months ago
B1FF_PSUVM|4 months ago
P.S. exploded view from the horse's mouth: https://www.nvidia.com/pt-br/products/workstations/dgx-spark...
reenorap|4 months ago
simonw|4 months ago
I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/
magicalhippo|4 months ago
The 120B model is better but too slow since I only have 16GB VRAM. That model runs decent[1] on the Spark.
[1]: https://news.ycombinator.com/item?id=45576737
cocogoatmain|4 months ago
behnamoh|4 months ago
jhcuii|4 months ago
_joel|4 months ago
justincormack|4 months ago
KeplerBoy|4 months ago
triwats|4 months ago
You CAN build - but for people wanting to get started this could be a real viable option.
Perhaps less so though with Apple's M5? Let's see...
https://flopper.io/gpu/nvidia-dgx-spark
andy99|4 months ago
BoredPositron|4 months ago
roughsquare|4 months ago
saagarjha|4 months ago
unknown|4 months ago
[deleted]
storus|4 months ago
ChrisArchitect|4 months ago
solarboii|4 months ago
ur-whale|4 months ago
kanwisher|4 months ago
triwats|4 months ago
Management becomes layers upon layers of bash scripts which ends up calling a final batch script written by Mellanox.
They'll catch up soon, but you end up having to stay strictly on their release cycle always.
Lots of effort.
p_l|4 months ago
pjmlp|4 months ago
jasonjmcghee|4 months ago
But maybe this will change? Software issues somehow?
It also runs CUDA, which is useful
amelius|4 months ago
Can anyone explain this? Does this machine have multiple CPU architectures?
catwell|4 months ago
fisian|4 months ago
wmf|4 months ago
simonw|4 months ago
matt3210|4 months ago
I should be allowed to do stupid things when I want. Give me an override!
simonw|4 months ago
rgovostes|4 months ago
monster_truck|4 months ago
I'd be pissed if I paid this much for hardware and the performance was this lacklustre while also being kneecapped for training
rubatuga|4 months ago
_ache_|4 months ago
Obviously, even with connectx, it's only 240Gi of VRAM, so no big models can be trained.
rvz|4 months ago
The DGX Spark is completely overpriced for its performance compared to a single RTX 5090.
sailingparrot|4 months ago
That's the use case, not running LLM efficiently, and you can't do that with a RTX5090.
_ache_|4 months ago
I don't think the 5090 could do that with only 32G of VRAM, couldn't it ?