975,800,000 GPU Wh * (1.2 to account for non-GPU hardware) * (1.3 PUE [1]) = 1,522,248,000 Total Wh, or 1,522,248 kWh to train DeepSeek-V3
(1,522,248 kWh) * (0.582kg CO2eq/kWh in China [2]) = 885,948 kg CO2 equivalents to train DeepSeek-V3
A typical US passenger vehicle emits about 4.6 metric tons of CO2 per year. [3]
885,948 kg CO2 per DeepSeek / 4,600 kg CO2 per car = 192.6 cars per DeepSeek
So, the final training run for DeepSeek-V3 emitted as much greenhouse gasses as would be emitted from running about 193 more cars on the road for a year.
I also did some more math and found that this training run used about as much electricity as 141 US households would use over the course of a year. [4]
If they have a cluster with 2,000 H800 GPUs (which is what they have stated in public) training would take 2,800,000 / (2,000 * 24 * 30) ~ 2 months.
A cluster of 2,000 GPUs is what a second tier AI lab has access to. And it shows that you can play in the state of the art LLM-game with some capital and a lot of brains.
Can someone put this into perspective? I'm finding heterogenous data on other models, i.e. number of tokens, number of GPUs used, cost, etc. It's hard to compare it all.
These articles are gold, thank you. I used your gemma one from a few weeks back to get gemma 3 performing properly. I know you guys are all GPU but do you do any testing on CPU/GPU mixes? I'd like to see the pp and t/s on pure 12 channel epyc and the same with using a 24 gig gpu to accelerate the pp.
Hasn't been updated for the -0324 release unfortunately, and diff-pdf shows only a few small additions (and consequent layout shift) for the updated arxiv version on Feb 18.
I like that they give advice to hardware manufacturers:
- offload communication to a dedicated co-proc
- implement decent precision for accumulating fp8 operations
- finer-grained quantization
...
[+] [-] Centigonal|11 months ago|reply
2,788,000 GPU-hours * 350W TDP of H800 = 975,800,000 GPU Watt-hours
975,800,000 GPU Wh * (1.2 to account for non-GPU hardware) * (1.3 PUE [1]) = 1,522,248,000 Total Wh, or 1,522,248 kWh to train DeepSeek-V3
(1,522,248 kWh) * (0.582kg CO2eq/kWh in China [2]) = 885,948 kg CO2 equivalents to train DeepSeek-V3
A typical US passenger vehicle emits about 4.6 metric tons of CO2 per year. [3]
885,948 kg CO2 per DeepSeek / 4,600 kg CO2 per car = 192.6 cars per DeepSeek
So, the final training run for DeepSeek-V3 emitted as much greenhouse gasses as would be emitted from running about 193 more cars on the road for a year.
I also did some more math and found that this training run used about as much electricity as 141 US households would use over the course of a year. [4]
[1] https://enviliance.com/regions/east-asia/cn/report_10060
[2] https://ourworldindata.org/grapher/carbon-intensity-electric...
[3] https://www.epa.gov/greenvehicles/greenhouse-gas-emissions-t...
[4] divided total kWh by the value here: https://www.eia.gov/tools/faqs/faq.php?id=97&t=3
[+] [-] patapong|11 months ago|reply
[0] https://skift.com/2024/11/06/co2-setback-as-emissions-on-uk-...
[+] [-] hugs|11 months ago|reply
[+] [-] pogue|11 months ago|reply
[+] [-] skummetmaelk|11 months ago|reply
[+] [-] chvid|11 months ago|reply
A cluster of 2,000 GPUs is what a second tier AI lab has access to. And it shows that you can play in the state of the art LLM-game with some capital and a lot of brains.
[+] [-] andai|11 months ago|reply
[+] [-] danielhanchen|11 months ago|reply
[+] [-] behohippy|11 months ago|reply
[+] [-] kristjansson|11 months ago|reply
[+] [-] gdiamos|11 months ago|reply
[+] [-] jxjnskkzxxhx|11 months ago|reply
[+] [-] tmabraham|11 months ago|reply
[+] [-] benob|11 months ago|reply
[+] [-] system2|11 months ago|reply
[deleted]
[+] [-] 0x008|11 months ago|reply
[+] [-] litbear2022|11 months ago|reply
[+] [-] nurettin|11 months ago|reply
[+] [-] rupertqin|11 months ago|reply
[deleted]
[+] [-] hulitu|11 months ago|reply
That's what they are good at. /s
[+] [-] MPSFounder|11 months ago|reply
[deleted]