top | item 38660341

(no title)

duchenne | 2 years ago

The most important paper to understand this issue is "Sacling Laws of Neural Language Models" by Open AI in 2020 [1]. Many consider it the most important paper that predicted the high performance of modern LLMs.

This paper shows how the loss decreases when you increase the model size, compute, or training dataset size.

From the article:

> Convergence is inefficient: When working within a fixed compute budget C but without any other restrictions on the model size N or available data D, we attain optimal performance by training very large models and stopping significantly short of convergence.

It clearly states that when you are limited by your training time compute, you should under-train your model.

[1] https://arxiv.org/abs/2001.08361

discuss

swyx|2 years ago

that paper is now considered to be a psyop fwiw - but in the direction of too little data, not too many layers

highfrequency|2 years ago

Can you clarify what you mean?