I don’t think people understand the point sutton was making; he’s saying that general, simple systems that get better with scale tend to outperform hand engineered systems that don’t. It’s a kind of subtle point that’s implicitly saying hand engineering inhibits scale because it inhibits generality. He is not saying anything about the rate, doesn’t claim llms/gd are the best system, in fact I’d guess he thinks there’s likely an even more general approach that would be better. It’s comparing two classes of approaches not commenting on the merits of particular systems.
joe_the_user|7 months ago
General-purpose-algorithms-that-scale will beat algorithms that aren't those
The most simple general purpose, scaling algorithm will win, at least over time
Neural networks will win
LLMs will reach AGI with just more resources
arrowsmith|7 months ago
xpe|7 months ago
This is your reading of Sutton. When I read his original post, I don't extract this level of nuance. The very fact that he calls it a "lesson" rather than something else, such as a "tendency", suggests Sutton may not hold the idea lightly*. In other words, it might have become more than a testable theory; it might have become a narrative.
* Holding an idea lightly is usually good thing in my book. Very few ideas are foundational.
eldenring|7 months ago
visarga|7 months ago
I think at the moment the best source of data is the chat log, with 1B users and over 1T daily tokens over all LLMs. These chat logs are at the intersection of human interests and LLM execution errors, they are on-policy for the model, right what they need to improve the next iteration.