This is very impressive, though an adjacent question — does anyone know roughly how much time and compute cost it takes to train something like the 405B? I would imagine with all the compute Meta has that the moat is incredibly large in terms of being able to train multiple 405B-level morels and compete.
sva_|1 year ago
https://github.com/meta-llama/llama-models/blob/main/models/...
Escapado|1 year ago