top | item 34006388

The Bitter Lesson (2019)

94 points| ogoparootbbo | 3 years ago |incompleteideas.net

33 comments

abetusk|3 years ago

I'm shocked there's so much push back on this here. This was written in 2019 but it applies even more so today. All the recent advances in AI for the past decade have, as a base requirement, massive amounts of compute. Not a proof but it should be a big red flag that if you don't believe Sutton, you should maybe consider.

Put it this way. Do you believe premature optimization is the root of all evil? Then why do you believe that some subtle, intricate optimization for machine intelligence will win out over brute compute force? It's not that there's optimizations to be done, it's knowing what optimizations will yield the most value after the space is explored with better computing capabilities.

To me, this is almost like a generalized "Proebstings Law" [0], where compiler optimizations give a doubling every 18 years compared, to say, some type of generalized Moore's law which give roughly a doubling in compute every 1.5 to 2 years.

[0] https://zeux.io/2022/01/08/on-proebstings-law/

g42gregory|3 years ago

I think this bitter lesson needs to be taken for a several grains of salt.

Number one, the progress in a particular AI field tends to go, at first, from custom to more general algorithms, exactly as Professor Richard Sutton described. However, there is a second part to this progress, where, once we "understood" (which we never really do) the new level of general algorithms (say Transformers in NLP), we begin to put back in the all the things we learned before (say, from Linguistics experience, we put the bias towards compositionality and corresponding tree structures back into the Transformers).

Number two, the computationally scalable algorithms always win in the environments where you have unlimited access to the computation and the data, i.e. if you working for Google, Facebook, Alibaba, etc... In other companies, you have limited computational budget and limited data. You could end up putting back-in a lot of sophisticated inductive biases back into your DL algorithms.

fnbr|3 years ago

I don’t follow your critique for #2. SVMs, Random forests, etc., aren’t the counterexample to Rich’s post (for anyone who knows him, Rich doesn’t even particularly _like_ neural networks). The counterexample is hand crafted features.

A counter example would be showing a number of successful examples in, say, computer vision, where handcrafted features do better than learned features. This is largely not the case. In, say, both NLP and Computer Vision, learned features dominate, even at companies with less compute (they use pretrained models).

(Disclaimer: I work with Rich.)

evrydayhustling|3 years ago

I think there's also a challenging line to draw about where a defining a search space stops and where encoding knowledge begins. When you define attention modules for an LLM, encode a search heuristic into A*, or define a feature space for a random forest, you are encoding domain knowledge, i.e. adding bias in exchange for faster learning relative to an even more general model. At any given time, the best performing computation heavy techniques have embedded more structural knowledge than zero, while much less than some experts believed necessary.

perrygeo|3 years ago

Four very interesting concepts in a short space:

- "general methods that leverage computation are ultimately the most effective, and by a large margin"

- "[search and learning are] methods that continue to scale with increased computation"

- "We should stop trying to find simple ways to think about the contents of minds"

- "We want AI agents that can discover like we can, not which contain what we have discovered"

In other words, computer programs should stop trying be something they are not. They are not AI. Computers are expensive machines that can (very efficiently, and with economies of scale) calculate and present anything that the author of the software desires. It takes actual human intelligence, economics, and ethics to translate that into action.

nonrandomstring|3 years ago

Sutton also raises one uncomfortable question. Where are the limits of "our field". If we follow Sutton's (interesting and thought-provoking) advice, where do we stop throwing away human knowledge in favour of general, bare methods? Shall we abandon expert knowledge? Procedural knowledge? Structural knowledge? Algorithms and data structures? Should the rest of computer science surrender in the face of efficiently calculated matrices?

visarga|3 years ago

I think it says something else - when you can marry search with learning, you can surpass anything else, of course paying the price of compute cost.

manmal|3 years ago

Can’t a program that searches and learns not become an AI? I think there’s a leap in your argument that you haven’t documented.

seydor|3 years ago

This is a philosophical and epistemological matter that is often undiscussed. Cognitive philosophers are still hung up on questions from the 70s and their offshoots ("hard problem") etc.

On the other hand i am not sure if the "computational" people often know what they are doing. Looking at something like the deep Transformer models, one has to ask if there is any rhyme or reason there, or the thing just works because it's too big and deep. Same even with gradient descent methods, are we sure there aren't closed form solutions instead?

There s an even more pessimistic view of this: that the brain and its creations (language, formal systems etc) are resting on the chaos of spiking cells, and are not as ideal as I 'd like them to be.

visarga|3 years ago

Apparently GPT-3 trained on code gained the ability for reasoning, but plain language, or even multi-task fine tuning, were not enough. Maybe the quality of the data makes a difference in the robustness of the model.

This interesting discovery, when applied to brains, means the same brain trained with different data would be able to display emerging abilities. Maybe these abilities are more in the data than the architecture.

If we want better AI we need to come up with better data.

coldtea|3 years ago

>The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation.

That doesn't sound right.

If there's an "ultimate reason" for computation AI success, is that the problem, of thought, signal processing, pattern matching etc, is not "rational" or based on semantic manipulation, but computational in its nature/substrate in humans too.

Computation being increasingly cheaper wouldn't guarantee a success in AI, if those problems weren't inherently solvable by mere throwing computation at them.

That is, it's not like we solve it with a brute force over a "proper semantic" approach because brute force got cheaper. It's more like the proper approach is more like our computation one, than a semantic one, and we wrongly assumed that this is not the case in the 70s and 80s, because we only considered the higher levels of conscious/semantic processing in our brains, and not the deep computational processing underneath them.

>They said that ``brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess.

I'd think that most chess playing in humans happens at an uncoscious level, after the player has "trained" their brain, than actual conscious semantic arguing about the next move. The conscious manipulation comes only after huge swaths of moves have been pruned through pattern matching/computation going on in the background.

lll-o-lll|3 years ago

I see a bunch of comments all basically saying Rick is wrong. It’s unintuitive, I think this is why it feels wrong.

Intuition is a curse though. The data to hand clearly indicates that the big (compute), dumb (search and learning) approach is the only way that’s worked. Emergent properties aren’t very satisfying, particularly when we can’t model or understand, but This is The Way.

Baeocystin|3 years ago

https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Obje...

Fiction, but closer to reality now than it was when it was written.

garbanz0|3 years ago

Natural selection is a massive amount of computation. It's not like we got to skip that. Any learning that we do without computation is based on structures that were computed by evolution. One way or another the "learning" process is happening somewhere.

geysersam|3 years ago

On the other hand, how many individuals have lived in the in-between chimpanzee and homo sapiens? The "training data" isn't overwhelming. It seems evolution typically does better than a brute force learning algorithm?

Archelaos|3 years ago

"We want AI agents that can discover like we can, not which contain what we have discovered."

I wonder if we can even go beyond that and find ways to abstract "discoverability" so that AI agents can evolve into diverse species, perhaps entirely different from our specific ways of discovering. As an analogue, I think of the nervous system of octopuses, which is very different from ours, but capable of amazing feats.

visarga|3 years ago

> ways to abstract "discoverability"

Combine "search" with "learning". Or content generation with content validation, and retrain on the clean outputs. Or, run many simulations such as AlphaGo, and learn from the outcomes.

In general the idea is to use lots of compute to generate interesting and hard to come by training data for the next iteration. This approach is necessary because we have exhausted most of the good training data and need a path forward. You can't copy money, but you can copy the model and data.

dang|3 years ago

The Bitter Lesson (2019) - https://news.ycombinator.com/item?id=30889873 - April 2022 (37 comments)

The Bitter Lesson - https://news.ycombinator.com/item?id=28409314 - Sept 2021 (1 comment)

The Bitter Lesson (From AI Research) - https://news.ycombinator.com/item?id=27924335 - July 2021 (1 comment)

The Bitter Lesson (2019) - https://news.ycombinator.com/item?id=23781400 - July 2020 (85 comments)

The Bitter Lesson - https://news.ycombinator.com/item?id=19393432 - March 2019 (53 comments)

unknown|3 years ago

[deleted]

jonnycomputer|3 years ago

Turns out it might be even easier to build a brain than to understand one.

sdenton4|3 years ago

The evolutionary process doesn't need to understand brains in order to build them... Why should we?

jokethrowaway|3 years ago

That makes perfect sense.

At the same time, will we ever get General AI just by throwing more data into the models?

Our biological brains definitely have way more data than the models we have right now but they also have different mechanisms which we barely understand.

I'm not confident we'll get human-like behaviour just by throwing infinite amount of data into a model, I think we will need significant changes to how we do NN as well.

Or maybe we just grow a brain in the lab, wire some inputs and outputs, train it for 20 years and then treat is a CPU.

tensorturtle|3 years ago

On the whole I agree, but having read "Reinforcement Learning: An Introduction" by him and Barto, this article comes across as a hardly nuanced self-endorsement of RL as "the inevitable future of AI". Without mentioning RL by name (but hinting at it with HMM, search&learning- like exploration&exploitation), I think he might be suggesting that any supervised learning is still too specific. Not to mention that he works for DeepMind, which has found fruitful applications of RL.

calrizien|3 years ago

You have to let you chAIld learn for itself.

Daub|3 years ago

> [the critics of brute force] said that ``brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess.

No expert in AI or chess, but I assume that this observation is wrong. Chess players do indeed construct moves based on a search of an interior mental library... a sort of pattern recognition. Any chess players here agree/disagree?

As an artist who is amazed at some of the AI art coming out, I can tell you that this is how painters make their paintings. AI is just emulating this. Of course there are differences: AI is not aware of culture, society, new technologies etc. etc. It is also in a closed loop of reference.

My colleagues and I are now envisioning a future where AI image rule the roost, but is calling upon the same 'bucket' of human-made existing art. Eventually AI will start using other AI art as reference, and will dissolve into a sort of 'Lorem ipsum' state: an impression of sense without sense. Indeed, it might be argued that to a degree this has already happened.

modeless|3 years ago

Pattern recognition and brute force search are not the same thing. Deep Blue was literally trying every possible move many moves deep, with some heuristics to avoid wasting time on obviously bad options. Humans definitely don't do that.

sdenton4|3 years ago

"Eventually AI will start using other AI art as reference, and will dissolve into a sort of 'Lorem ipsum' state: an impression of sense without sense. Indeed, it might be argued that to a degree this has already happened."

The AI art on the internet is (in most cases) cherry-picked, and therefore reasonable fodder for further training.

visarga|3 years ago

> will dissolve into a sort of 'Lorem ipsum' state

I don't think so. It will explode into a kaleidoscope of diverse styles and ideas, some amazing, some meh, and some horrible. We'll be able to rank by preference and aesthetic scores to get to the ones we like.