(no title)
j_not_j | 1 year ago
Their paper (arxiv 2412:1947) explains they used 2048 H800s. A computer cluster based on 2048 GPUs would have cost around $400M about two years ago when they built it. (Give or take, feel free to post corrections.)
The point is they got it done cheaper than OpenAI/Google/Meta/... etc.
But not cheaply.
I believe the markets are overreacting. Time to buy (tinfa).
segmondy|1 year ago
"Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."
V3 was released a bit a month ago, V3 is not what took the world by storm but R1. The price everyone is talking about is the price for V3.
ecocentrik|1 year ago
kamaal|1 year ago
This is still quite impressive, given most people are likely to buy cloud infrastructure from AWS or Azure than build their own datacenter. So the Math checks out.
I don't think compute capacity built already will go waste, likely more and bigger things will get built in the coming years so most of it will be used for that purpose.
achempion|1 year ago
shusaku|1 year ago
aftbit|1 year ago
If they paid $70,000 per GPU[2] plus $5000 per 4-GPU compute node (random guess), then the hardware would have cost about $150M to build. If you add in network hardware and other data-centery-things, I could see it reaching into the $200M range. IMO $400M might be a bit of a stretch but not too wildly off base.
To reach parity with the rental price, they would have needed to re-train 70 times (i.e. over 12 years). They obviously did not do that, so I agree it's a bit unfair to cost this based on $2M in GPU rentals. Why did they buy instead of rent? Probably because it's not actually that cheap to get 2048 concurrent high-performance connected GPUs for 60 days. Or maybe just because they had cash for capex.
1: https://stratechery.com/2025/deepseek-faq/
2: https://www.tomshardware.com/news/price-of-nvidia-compute-gp...
aisio|1 year ago
bottlelion|1 year ago
dathinab|1 year ago
Note that this are the China prices with high markup due to export controls etc.
The price of a H800 80GiB in the US is today more like ~32k$USD .
But for using H800 clusters well you also need as fast as possible interconnects, enough motherboards, enough fast storage, cooling, building, interruption free power etc. So the cost of building a "H800" focused Datacenter is much much higher then multiplying GPU cost by number.
Still $400m seem unlikely.
317070|1 year ago
[0] https://www.tomshardware.com/news/price-of-nvidia-compute-gp...
InkCanon|1 year ago
cpldcpu|1 year ago
The number does not include cost for personell, experiments, data preparation, chasing dead ends, and most importantly, it does not include the reinforcement learning step that made R1 good.
Furthermore, it is not factored in that both R3 and V1 are build on top of an enormous amount of synthetic data the was generated by other LLMs.
dtech|1 year ago
jgalt212|1 year ago
OK, but does this quant fund have this amount of a spare resources to take a flyer on a vanity project?
TypingOutBugs|1 year ago
unknown|1 year ago
[deleted]
ahzhou|1 year ago
See: https://planetbanatt.net/articles/v3fermi.html
K0balt|1 year ago
Calling the training load for DeepSeek 6% of the value of that cluster seems generous. It probably used less of the recoverable value than that.
SR2Z|1 year ago
I think the salient point here is that the "price to train" a model is a flashy number that's difficult to evaluate out of context. American companies list the public cloud price to make it seem expensive; Deepseek has an incentive to make it sound cheap.
The real conclusion is that world-class models can now be trained even if you're banned from buying Nvidia cards (because they've already proliferated), and that open-source has won over the big tech dream of gatekeeping the technology.
TechDebtDevin|1 year ago
htrp|1 year ago
aftbit|1 year ago
/s TINFA -> this is not financial advice
baal80spam|1 year ago