top | item 35064378

(no title)

marcusf | 3 years ago

Parameters and training corpus size matter. Last year a new paper (Google "Chinchilla optimality") focused on compute-optimal LLMs found that we'd been under-training models -- i.e. you could wring more performance out of smaller models by training on more data. But (as far as I understand it - interested layman here) - for a given amount of data, model performance seems to scale more or less linearly with parameters.

Now, we could see another model architecture than the current reigning transformer architecture upend this (much work is ongoing on breaking the quadratic term in the transformer that computationally bounds its performance - an example is the Hyena paper that was published just the other day).

Biggest computer and most data wins is still the paradigm here.

discuss

No comments yet.