top | item 43490167

DeepSeek-V3 Technical Report

132 points| signa11 | 11 months ago |arxiv.org | reply

34 comments

order
[+] Centigonal|11 months ago|reply
The GPU-hours stat here allows us to back out some interesting figures around electricity usage and carbon emissions if we make a few assumptions.

2,788,000 GPU-hours * 350W TDP of H800 = 975,800,000 GPU Watt-hours

975,800,000 GPU Wh * (1.2 to account for non-GPU hardware) * (1.3 PUE [1]) = 1,522,248,000 Total Wh, or 1,522,248 kWh to train DeepSeek-V3

(1,522,248 kWh) * (0.582kg CO2eq/kWh in China [2]) = 885,948 kg CO2 equivalents to train DeepSeek-V3

A typical US passenger vehicle emits about 4.6 metric tons of CO2 per year. [3]

885,948 kg CO2 per DeepSeek / 4,600 kg CO2 per car = 192.6 cars per DeepSeek

So, the final training run for DeepSeek-V3 emitted as much greenhouse gasses as would be emitted from running about 193 more cars on the road for a year.

I also did some more math and found that this training run used about as much electricity as 141 US households would use over the course of a year. [4]

[1] https://enviliance.com/regions/east-asia/cn/report_10060

[2] https://ourworldindata.org/grapher/carbon-intensity-electric...

[3] https://www.epa.gov/greenvehicles/greenhouse-gas-emissions-t...

[4] divided total kWh by the value here: https://www.eia.gov/tools/faqs/faq.php?id=97&t=3

[+] hugs|11 months ago|reply
the nice thing about ai's energy usage is that no one complains about bitcoin's energy usage anymore. (i'm kidding, people still complain.)
[+] pogue|11 months ago|reply
Are the stats from training ChatGPT, Claude or other models public? It would be interesting to see a comparison to them.
[+] skummetmaelk|11 months ago|reply
The fact that you can unironically put the "only" modifier on a training time of 2.8 million GPU hours is nuts.
[+] chvid|11 months ago|reply
If they have a cluster with 2,000 H800 GPUs (which is what they have stated in public) training would take 2,800,000 / (2,000 * 24 * 30) ~ 2 months.

A cluster of 2,000 GPUs is what a second tier AI lab has access to. And it shows that you can play in the state of the art LLM-game with some capital and a lot of brains.

[+] andai|11 months ago|reply
Can someone put this into perspective? I'm finding heterogenous data on other models, i.e. number of tokens, number of GPUs used, cost, etc. It's hard to compare it all.
[+] danielhanchen|11 months ago|reply
Re DeepSeek-V3 0324 - I made some 2.7bit dynamic quants (230GB in size) for those interested in running them locally via llama.cpp! Tutorial on getting and running them: https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-...
[+] behohippy|11 months ago|reply
These articles are gold, thank you. I used your gemma one from a few weeks back to get gemma 3 performing properly. I know you guys are all GPU but do you do any testing on CPU/GPU mixes? I'd like to see the pp and t/s on pure 12 channel epyc and the same with using a 24 gig gpu to accelerate the pp.
[+] kristjansson|11 months ago|reply
Hasn't been updated for the -0324 release unfortunately, and diff-pdf shows only a few small additions (and consequent layout shift) for the updated arxiv version on Feb 18.
[+] gdiamos|11 months ago|reply
Nice to see a return to open source in models and training systems.
[+] benob|11 months ago|reply
I like that they give advice to hardware manufacturers: - offload communication to a dedicated co-proc - implement decent precision for accumulating fp8 operations - finer-grained quantization ...
[+] system2|11 months ago|reply

[deleted]

[+] 0x008|11 months ago|reply
This model is open source and Beats all proprietary models in benchmarks. How is this stagnant?
[+] litbear2022|11 months ago|reply
Yeah! Just steal new Boeing 6th-gen stealth fighter from slides.
[+] nurettin|11 months ago|reply
You mean invent something new, publish the entire process and watch everyone rename and implement it next week like <think> blocks?
[+] hulitu|11 months ago|reply
> OpenAI is making announcements

That's what they are good at. /s