top | item 35809436

(no title)

wlib | 2 years ago

My memory says that there’s a “Chinchilla” paper showing how to make the best model with a given training budget. There’s a trade-off between the amount of training data and the size of the model itself. Chinchilla under-training would mean that the model is too big for the amount of training data used. Llama is Chinchilla over-trained in that there is a ton of data relative to the small size of the model.

Note that this is still desirable for inference because you want the most possible training on whatever model you can actually fit in your memory.

discuss

No comments yet.