top | item 42852229

(no title)

j_not_j | 1 year ago

I don't for a minute believe Deepseek v3 was built with a $6M rental.

Their paper (arxiv 2412:1947) explains they used 2048 H800s. A computer cluster based on 2048 GPUs would have cost around $400M about two years ago when they built it. (Give or take, feel free to post corrections.)

The point is they got it done cheaper than OpenAI/Google/Meta/... etc.

But not cheaply.

I believe the markets are overreacting. Time to buy (tinfa).

discuss

segmondy|1 year ago

They pointed out that the cost calculation is based on if those GPUs were rented at $2/hr. They are not factoring in the prior cost of buying those H800s because they didn't buy it to build R1. They are not factoring in the cost to build v2, or v2.5. The cost is to build V3. The cost to build R0 and R1 on top of v3, seems far cheaper and they didn't mention that. They are not factoring in the cost to build out their datacenter or salary. Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.

"Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."

V3 was released a bit a month ago, V3 is not what took the world by storm but R1. The price everyone is talking about is the price for V3.

ecocentrik|1 year ago

If this weren't an attempt to sell a false equivalency, at least one story would have details on the equivalent rental cost of compute used to train closed source frontier models from OpenAI, Anthropic, Mistral... Lack of clarity makes it a story.

kamaal|1 year ago

>>Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.

This is still quite impressive, given most people are likely to buy cloud infrastructure from AWS or Azure than build their own datacenter. So the Math checks out.

I don't think compute capacity built already will go waste, likely more and bigger things will get built in the coming years so most of it will be used for that purpose.

achempion|1 year ago

thanks for the explanation, these facts are completely overlooked in mass media in favour of catchy headlines

shusaku|1 year ago

You’re confusing the metric for reality. The point is to compare the cost of training in terms of node hours with a given configuration. That’s how you get apples to apples. Of course it doesn’t cover building the cluster, housing the machine, the cleaning staff’s pension, or whatever.

aftbit|1 year ago

The math they gave was 2,788,000 H800 GPU hours[1], with a rental price of $2/GPU-hour[1], which works out to $5.6M. If they did that on a cluster of 2048 H800s, then they could re-train the model every ~1400 hours (~2 months).

If they paid $70,000 per GPU[2] plus $5000 per 4-GPU compute node (random guess), then the hardware would have cost about $150M to build. If you add in network hardware and other data-centery-things, I could see it reaching into the $200M range. IMO $400M might be a bit of a stretch but not too wildly off base.

To reach parity with the rental price, they would have needed to re-train 70 times (i.e. over 12 years). They obviously did not do that, so I agree it's a bit unfair to cost this based on $2M in GPU rentals. Why did they buy instead of rent? Probably because it's not actually that cheap to get 2048 concurrent high-performance connected GPUs for 60 days. Or maybe just because they had cash for capex.

1: https://stratechery.com/2025/deepseek-faq/

2: https://www.tomshardware.com/news/price-of-nvidia-compute-gp...

aisio|1 year ago

2048 GPUs cost $400m? pretty sure the GPUs don't cost 200k each?

bottlelion|1 year ago

And a pile of GPUs doesn't really do much good without servers, racks, networking, power, cooling, and a building to house it all in.

dathinab|1 year ago

Looking around a bit the price was ~70k$USD _in China_ around the time they where released in 2023, cheaper bulk sells where a thing, too.

Note that this are the China prices with high markup due to export controls etc.

The price of a H800 80GiB in the US is today more like ~32k$USD .

But for using H800 clusters well you also need as fast as possible interconnects, enough motherboards, enough fast storage, cooling, building, interruption free power etc. So the cost of building a "H800" focused Datacenter is much much higher then multiplying GPU cost by number.

Still $400m seem unlikely.

317070|1 year ago

I find reports of these GPUs costing $70k each 6 quarters ago [0]. So, maybe not $400m, but a $100m+ number seems about right.

[0] https://www.tomshardware.com/news/price-of-nvidia-compute-gp...

InkCanon|1 year ago

To clarify, a legitimate benchmark for training is to calculate the running cost, not capex cost. Because obviously the latter would drop dramatically with the number of models you train. But to put into context, Meta wants to spend 50B on AI this year alone. And it already has 150x the compute of DS. The very real math going through investors head is - what's stopping Zuck from taking 10B of that and mailing a 100 million signing bonus to every name on the R1 paper?

cpldcpu|1 year ago

The $6M that is thrown around is from the DS V3 paper and is for the cost of a single training run for DeepSeek V3 - the base model that R1 is built on.

The number does not include cost for personell, experiments, data preparation, chasing dead ends, and most importantly, it does not include the reinforcement learning step that made R1 good.

Furthermore, it is not factored in that both R3 and V1 are build on top of an enormous amount of synthetic data the was generated by other LLMs.

dtech|1 year ago

Comparing cost of buying with cost of running is weird. It's not like they build a new cluster, train just this one model, and then incinerate everything.

jgalt212|1 year ago

> A computer cluster based on 2048 GPUs would have cost around $400M about two years ago when they built it.

OK, but does this quant fund have this amount of a spare resources to take a flyer on a vanity project?

TypingOutBugs|1 year ago

They bought between 10k and 50k of them before the US restrictions came into place. Sounds like DeepSeek gets to use them for training, as they were profitable (could still be, not sure).

unknown|1 year ago

[deleted]

ahzhou|1 year ago

You can easily do a fermi estimate based on the information given. They are comparing GPU hours.

See: https://planetbanatt.net/articles/v3fermi.html

K0balt|1 year ago

If they bought them outright, they might have paid 60m, (GPU only) . After infrastructure, maybe 100M.

Calling the training load for DeepSeek 6% of the value of that cluster seems generous. It probably used less of the recoverable value than that.

SR2Z|1 year ago

Electricity in China, even at residential rates, is 1/10th the cost it is in CA.

I think the salient point here is that the "price to train" a model is a flashy number that's difficult to evaluate out of context. American companies list the public cloud price to make it seem expensive; Deepseek has an incentive to make it sound cheap.

The real conclusion is that world-class models can now be trained even if you're banned from buying Nvidia cards (because they've already proliferated), and that open-source has won over the big tech dream of gatekeeping the technology.

TechDebtDevin|1 year ago

Over the last few days people have asked me if they think NVIDIA is fkd.. It still takes two H100s to run inference on the DS v3 671b @ <200 tokens per second.

htrp|1 year ago

only 2 ? what kind of h100s do you have?

aftbit|1 year ago

I couldn't find TINFA on Yahoo Finance but I bought INFA assuming that was close enough. Thanks for the financial advice. :P

/s TINFA -> this is not financial advice

baal80spam|1 year ago

Now that you bought, he will dump! :-P