top | item 44763166

(no title)

itkovian_ | 7 months ago

I don’t think people understand the point sutton was making; he’s saying that general, simple systems that get better with scale tend to outperform hand engineered systems that don’t. It’s a kind of subtle point that’s implicitly saying hand engineering inhibits scale because it inhibits generality. He is not saying anything about the rate, doesn’t claim llms/gd are the best system, in fact I’d guess he thinks there’s likely an even more general approach that would be better. It’s comparing two classes of approaches not commenting on the merits of particular systems.

discuss

joe_the_user|7 months ago

It occurs to me that the bitter lesson is so often repeated because it involves a slippery slope or moot-and-castle argument. IE, the meaning people assign to the bitter lesson ranges between all the following:

General-purpose-algorithms-that-scale will beat algorithms that aren't those

The most simple general purpose, scaling algorithm will win, at least over time

Neural networks will win

LLMs will reach AGI with just more resources

arrowsmith|7 months ago

motte and bailey*

xpe|7 months ago

> I don’t think people understand the point sutton was making; he’s saying that general, simple systems that get better with scale tend to outperform hand engineered systems that don’t

This is your reading of Sutton. When I read his original post, I don't extract this level of nuance. The very fact that he calls it a "lesson" rather than something else, such as a "tendency", suggests Sutton may not hold the idea lightly*. In other words, it might have become more than a testable theory; it might have become a narrative.

* Holding an idea lightly is usually good thing in my book. Very few ideas are foundational.

eldenring|7 months ago

Yep this article is self centered and perfectly represents the type of ego Sutton was referencing. Maybe in a year or two general methods will improve the author's workflow significantly once again (eg. better models) and they would still add a bit of human logic on top and claim victory.

visarga|7 months ago

The point about training data stands. We usually only think of scaling compute, but we need to scale data as well, maybe even faster than compute. But we exhausted the source of high quality organic text, and it doesn't grow exponentially fast.

I think at the moment the best source of data is the chat log, with 1B users and over 1T daily tokens over all LLMs. These chat logs are at the intersection of human interests and LLM execution errors, they are on-policy for the model, right what they need to improve the next iteration.