top | item 41046765

(no title)

jeffchao | 1 year ago

This is very impressive, though an adjacent question — does anyone know roughly how much time and compute cost it takes to train something like the 405B? I would imagine with all the compute Meta has that the moat is incredibly large in terms of being able to train multiple 405B-level morels and compete.

discuss

sva_|1 year ago

30.84M H100 compute-hours, according to the model card

https://github.com/meta-llama/llama-models/blob/main/models/...

Escapado|1 year ago

Interestingly that’s less energy than the mass energy equivalent of one gram of matter or roughly 5 seconds worth of the worlds average energy consumption (according to wolfram alpha). Still an absolute insane amount of energy, as in about 5 million dollars at household electricity rates. Absolutely wild how much compute goes into this.