(no title)
wbhart | 2 years ago
Saying that performance on grade-school problems is predictive of performance on complex reasoning tasks (including theorem proving) is like saying that a new kind of mechanical engine that has 90% efficiency can be scaled 10x.
These kind of scaling claims drive investment, I get it. But to someone who understands (and is actually working on) the actual problem that needs solving, this kind of claim is perfectly transparent!
uoaei|2 years ago
dwaltrip|2 years ago
1. The scaling path is decreased val/test loss during training.
2. We have seen multiples times that large decreases in this loss have resulted in very impressive improvements in model capability across a diverse set of tasks (e.g. gpt-1 through gpt-4, and many other examples).
3. By now, there is tons of robust data demonstrating really nice relationships between model size, quantity of data, length of training, quality of data, etc and decreased loss. Evidence keeps building that most multi-billion param LLMs are probably undertrained, perhaps significantly so.
4. Ergo, we should expect continued capability improvement with continued scaling. Make a bigger model, get more data, get higher data quality, and/or train for longer and we will see improved capabilities. The graphs demand that it is so.
---
This is the fundamental scaling hypothesis that labs like OpenAI and Anthropic have been operating off of for the past 5+ years. They looked at the early versions of the curves mentioned above, extended the lines, and said, "Huh... These lines are so sharp. Why wouldn't it keep going? It seems like it would."
And they were right. The scaling curves may break at some point. But they don't show indications of that yet.
Lastly, all of this is largely just taking existing model architectures and scaling up. Neural nets are a very young technology. There will be better architectures in the future.
jacquesm|2 years ago
neilk|2 years ago
OOPMan|2 years ago
The hyperbole that surrounds them fits the mould nicely.
hutzlibu|2 years ago