top | item 35801612

(no title)

I'm not sure whether the number of parameters serves as a reliable measure of quality. I believe that these models have a lot of redundant computation and could be a lot smaller without losing quality.

discuss

cubefox|2 years ago

The Chinchilla scaling law describes, apart from the training data size, the optimal number of parameters for a given amount of computing power for training. See

https://dynomight.net/scaling/

sp332|2 years ago

For training, yes, but these models are optimized for inference, since inference will be run many more times than training. The original Llama models were run way past chinchilla-optimal amounts of data.