DeepMind says reinforcement learning is ‘enough’ to reach general AI

[+] webmaven|4 years ago|reply

Saying RL is sufficient to (eventually) achieve AGI is a bit misleading. One might similarly state that biological evolution is sufficient to (eventually) achieve biological general intelligence.

Both statements are probably true, but the parenthetical (eventually) is doing an awful lot of heavy lifting.

[+] burning_hamster|4 years ago|reply

I think the title of the paper makes more sense if you consider that ten years ago, someone could have written a paper in a similar spirit with a different take on "what is enough". Back then, it would probably have been titled: "Backpropagation of errors is enough".

The last ten years have shown that backpropagation -- while a crucial component -- is not enough. Personally, I would not be shocked to find out in the next ten years that reinforcement learning is not enough for an AGI (as there are aspects like one-shot learning, forgetting, sleep, and other phenomena for which the RL framework seems not a natural fit).

[+] admk|4 years ago|reply

That is if you believe biological general intelligence is the end goal of evolution, which I believe is highly unlikely.

Intelligence is simply a special side-product of evolution, there is nothing general about general intelligence. Many organisms can thrive without it.

There is also a non-negligible chance that all organisms would die out before reaching intelligence. We are fortunate to live in a world that produced us.

[+] neltnerb|4 years ago|reply

Yes, it's easy to be convinced on either side, the arguments write themselves. Yes, eventually a learning system might learn enough to be indistinguishable from intelligence. Or this might be entirely the wrong path and detracting from genuine new innovations in how we think about AI.

We won't be able to tell whether it's AGI or just good enough at trained tasks to trick us.

[+] criddell|4 years ago|reply

Does AGI imply human-level intelligence, or would the intelligence of a housefly qualify?

[+] ms1|4 years ago|reply

I think, in really broad terms, in order to get AGI actually we would need to do better than nature.

If our metric is (intelligence)/(joule), nature seems pretty bad at a first glance: it took many trillions of lifetimes to achieve "general intelligence" *

But then again, on the big stuff like this, have we ever really beat nature? That asterisk is there because, sure, turning the earth's biosphere into computers would make us smarter, but... are we sure?

(And also: human = general?)

[+] xamuel|4 years ago|reply

I thought it was a fun position paper, if not exactly groundbreaking.

They did avoid one common pitfall at least. They are (intentionally?) vague about which number systems the rewards can come from, apparently leaving it open whether the rewards need be real-valued or whether they can be, say, hyperreals, surreals, computable ordinals, etc. This avoids a trap I've written about elsewhere [1]: traditionally, RL rewards are limited to be real-valued (usually rational-valued). I argue that RL with real-valued rewards is NOT enough to reach AGI, because the real numbers have a constrained structure making them not flexible enough to express certain goals which an AGI should nevertheless have no problem comprehending (whether or not the AGI can actually solve them---that's a different question). In other words: if real-valued RL is enough for AGI, but real-valued RL is strictly less expressive than more general RL, then what is more general RL good enough for? "Artificial Better-Than-General Intelligence"?

Note, however, that almost all [2] practical RL agent technology (certainly any based on neural nets or backprop) very fundamentally assumes real-valued rewards. So if it is true that "RL is enough" but also that "real-valued RL is not enough", then the bad news is all that progress on real-valued RL is not guaranteed to help us reach AGI.

[1] "The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI", JAGI 2020, https://philpapers.org/archive/ALETAT-12.pdf

[2] A notable exception is preference-based RL

[+] visarga|4 years ago|reply

RL + piggybacking on human culture might be enough, or evolution + RL for biological agents.

[+] unknown|4 years ago|reply

[deleted]

[+] harry8|4 years ago|reply

Some Bozo who has heard all this many times before is suspicious of claims from places like Deep Mind who have a financial incentive to make them (keep funding) where there aren't working machines to back that claim up.

Some Bozo has no credentials, no reputation, no track record of publications and barely supports the claim they're making with anything much. Some Bozo has no financial incentives or otherwise to opine either way. Some Bozo doesn't even work in the field at all.

Bets: Some Bozo or Deep Mind turn out to be closer to being correct in the passing of some finite amount of time? 5 years? 10 Years? 25 Years?

[+] otabdeveloper4|4 years ago|reply

I'll bet a sum of real money that Some Bozo is correct.

Bozo has the hindsight of history and philosophy going for him, while Deep Mind has a huge financial temptation to sell snake oil.

[+] rusk|4 years ago|reply

Edit: sorry just realised you’re making the same point as me more or less. Putting yourself in third person. I’ll let my comment stand anyhow :)

Screwing my face up, looking at this sideways … but it seems as though you’re saying that the Bozos of HN have nothing useful to contribute to this discussion based on … [rereads] … their lack of academic credentials in the area… you could say this about just about any HN post I’m just wondering why this one? Here’s a thing though … if the understanding of a technology is so nuanced … that Bozos can’t “get” it … is it really that mature? We had functioning computers for 50 years but it was only when the Bozos got their hands on it that things took off. Internet for 20. Cell phones for 10. How long are we dabbling with neural networks? 50 years or so? All I see in this most recent explosion in AI is a rapid jump in the availability of cores. Ala Malthus once that newly available “source of nutrition” has been used up we will see a rapid die off once more and it will be another 20 years once the Bozo intellect has caught up before we look at this topic en masse again. Dismiss the Bozos at your peril. You’re dependent on them for innovation and consumption. Your sincerely, a Bozo.

[+] ggggtez|4 years ago|reply

Wrong: Some Bozo does have a stake.

The existence of human crafted general AI forces him to struggle with the possibility that there is no such thing as a soul.

I know a lot of people don't fall in that camp, but I heard enough "serious" people make such desperate claims to avoid thinking about the topic in a way that might challenge their underlying religious beliefs[1]. I think no one likes to admit that religion and spirituality often force someone to reject the possibility that AI is actually really much simpler than they think it "should" be, because then humans aren't special after all.

[1] Numerous arguments boil down to an argument that complexity is non reducible. You see it here, hidden in various comments as well.

[+] KeplerBoy|4 years ago|reply

Unfortunately i'm rooting for the Bozo, the current AI Revolution won't lead us anywhere and will ebb down eventually.

[+] tsimionescu|4 years ago|reply

The fact that RL in the extremely vague sense used in the article is enough for AGI is uncontroversial for anyone who believes intelligence and consciousness are physical processes.

However, this "result" is trivial. It is obviously equivalent to the claim that intelligence arose naturally in the biological world without influence from God.

[+] lvncelot|4 years ago|reply

Some cynic remarks that during the first AI golden years, claims of imminent success seemed to come from a place of hopeful naïveté of a fledgling science, whereas those same claims nowadays seem to come from a place of cold calculation of a booming business.

[+] jacquesm|4 years ago|reply

Said anonymous account on HN... If you're going to question other people's credentials, reputation, track record and claims make sure your own are solid. Those who live in glass houses shouldn't throw stones.

Finally, if you're going to attack someone's article: attack the article, not the person that wrote it. This is the lowest level of attack possible: the personal one. It's as ad-hominem as it gets.

[+] cscurmudgeon|4 years ago|reply

There is another set of people. There are people with solid track records in AI and ML that disagree with DeepMind.

[+] graderjs|4 years ago|reply

Plot-twist: Deep Mind comes out as Some Bozo comment author

[+] dqpb|4 years ago|reply

DeepMind is arguing from first principles. SomeBozo is arguing by analogy. DeepMind will achieve something and SomeBozo will achieve nothing.

The vast majority of ideas are wrong. Every idea is wrong until it leads to the one that is right.

This idea might be the right one, or it might be close to the right one, or it might be far from the right one, but the trajectory is headed toward the right idea. SomeBozo has no trajectory. The best he can do is watch from the sidelines.

[+] hervature|4 years ago|reply

I guess things are slowing down at DeepMind. I have tremendous respect for David Silver and his work on AlphaZero and Richard Sutton as a pioneer in RL. But the cynic in me is that this paper is just a result of Goodhart's law with publishing count as a metric. Any proof of the type of emergent behaviors that they mention from RL with an actual RL experiment would go a long way. Showing an RL agent developing a language would be extremely interesting. It makes me think they tried to show these emergent behaviors but could not and thus ended up with a hypothesis.

[+] maiodude|4 years ago|reply

They just "solved" protein folding late last year. How can you say things are slowing down? Do you honestly expect life-changing discoveries every other week?

[+] ChicagoBoy11|4 years ago|reply

Very tangential, but as someone who has gotten into the Game of Go because of their pioneering project in that space, I'm exceptionally grateful -- that alone had a very significant and positive impact on my life, and I can tell that in that entire community it was a watershed moment as well.

[+] uyt|4 years ago|reply

It might be a stretch but some people say that the weights learned by a neural network is somewhat like a language. For example if you look at the weights of a random middle layer it would seem like gibberish. Much like how aliens would react when looking at humans making gibberish noises (aka talking) to each other. In both cases they are just compressing signals based on learned primitives.

[+] Animats|4 years ago|reply

"A sufficiently powerful and general reinforcement learning agent may ultimately give rise to intelligence and its associated abilities. ... We do not offer any theoretical guarantee on the sample efficiency of reinforcement learning agents."

OK. This basically says "evolution works". But how fast? Biology took tens of millions of years to boot up.

An related question is how much compute power does evolution, viewed as a reinforcement learning system. have? That's probably something biologists have thought about. Anyone know? Evolution is not a very fast or efficient hill-climbing system, but there are a large number of parallel units. It's not a philosophical question; it's a measurable one. We can watch viruses evolve. We can watch bacteria evolve. Data can be obtained.

Two questions I pose occasionally are "how do we do common sense, defined as not screwing up in the next 30 seconds", and "why does robotic manipulation in unstructured situations still suck after 50 years". A good question to ask today is why reinforcement learning does so badly on those two problems. In both cases, you can define an objective function, but it may not be well suited to hill climbing.

[+] scythmic_waves|4 years ago|reply

/r/MachineLearning discussion:

https://www.reddit.com/r/MachineLearning/comments/nplhy3/r_r...

I'm with most of the comments there. This paper is ridiculously hand-wavey.

[+] rich_sasha|4 years ago|reply

Basically, any problem with a solution fits into RL: reward of 1 if you are AGI and 0 otherwise. Go learn.

This setting on its own is meaningless! The “how” of the RL agent is not even 99% of the problem, it is all of it.

Given our understanding of both DL and neuroscience, it is not even clear to me that we can say with confidence that Neural Networks are a sufficiently expressive architecture to cover an AGI.

The human brain is a deep net, sort of, but there is also plenty going on in our brains that we don’t understand. It could be that the magic sprinkle is orthogonal to DL and we just don’t know about it yet.

[+] azinman2|4 years ago|reply

Everything old is new again.

As far as I can tell, they're not actually proposing how to achieve this. I can't access the article without a host institution it seems (is there another link?), so I only have the article to go by. RL has been the basis for all robots engaging with the world, and that engagement with the physical world modeled using RL has been promised to make robots that can act like a 2 year old for a long time (see Cynthia Breazeal's work, for example). Yet AFAIK, we haven't actually achieved this as we don't know how to efficiently model the problem to have learning rates that reach anywhere near what we're able to do with DNNs today.

Perhaps someone who has access to the paper can say why this is a milestone? If Patricia Churchland suggests it is, then something new must be happening here.

[+] jjjdjjddddsfsd|4 years ago|reply

Are the just reformulating the principles of evolution in digital terms, and essentially not providing any new insights at all?

Yes, intelligence has been created by evolution. That doesn't imply that any system that is subject to evolutionary forces will lead to the creation of intelligence (and not within a reasonable timeframe, either). The challenge is to create a system that is capable of evolving intelligence.

Afaik some biologists even think that the evolution of intelligence was rather unlikely and would not necessarily happen again under the same circumstances as on earth.

[+] inciampati|4 years ago|reply

As a biologist and longtime dabbler machine learning and Bayesian methods, I tend to see intelligence as a manifestation of evolution. In the case of an organism, the improvement of the model (the genome) occurs through processes that are very similar to what we see in any kind of learning (real, brain based or "artificial", computer based).

Evolution and intelligence are inextricably linked. They are practically the same thing. This means that intelligence is probably a natural result of any system similar to those that support biologics. If you flow the right amount of energy through a substrate with complex enough building blocks, you'll eventually get life ~ which is just something smart enough to survive and feed off the available energy flows. In the world, this flow is radiation from the sun, while in a computer, it is governed by a more abstract loss or fitness function.

[+] webmaven|4 years ago|reply

> Afaik some biologists even think that the evolution of intelligence was rather unlikely and would not necessarily happen again under the same circumstances as on earth.

Hmm. Can you provide a pointer to those biologists?

AFAIK, high intelligence has arisen more than once on Earth (Hominoids, Cetaceans, Octopuses), so I'm somewhat skeptical of that claim, but perhaps they're construing intelligence more narrowly (ie. only Homo Sapiens qualifies).

[+] unishark|4 years ago|reply

I'd say it's even less than that. They seem to be summarizing the ways the problem of teaching an agent to do anything (including be generally intelligent) can be formulated as a problem of maximizing a reward (hence the title).

Another way to look at it is, if we had a good enough function (e.g. a universal approximator) it can be made to model any behavior using numerical optimization. Which I think isn't very surprising, but apparently there is some arguments about it.

[+] visarga|4 years ago|reply

Evolutionary algorithms are tricky, just like deep learning. It's not "just reformulating the principles of evolution in digital terms, and essentially not providing any new insights".

[+] blueblisters|4 years ago|reply

Current state-of-the-art in reinforcement learning can barely make a physical robot walk. In theory, with transfer learning, we will probably see better success over time but I'm looking forward to seeing results in practice.

A 2018 article about the challenges of reinforcement learning: https://www.alexirpan.com/2018/02/14/rl-hard.html

[+] Barrin92|4 years ago|reply

Sorry but where is actual scientific content in that paper? I'm concerned with the state of AI. saying that "reinforcement is all you need", when reinforcement learning is defined as abstract as "agent does something, adapts to environment and rewards, then does another thing" is borderline tautological.

The actual scientific question is, what are the mechanisms that make agents work, what are the fundamental modules within intelligent systems, is there a distinction between digital and biochemical systems, what costs are there in terms of resources and energy to get to a certain level of intelligence, and so on. Real questions with specific answers. For all the advances coming from just upping the amount of data and GPU hours, there is so little progress on trying to have a model of the structures that underpin intelligence.

[+] zzzzzzzza|4 years ago|reply

i think part of what they are saying is that your approach is wrong, (e.g. looking for then copying submodules within intelligence won't generalize),

trying to answer specific questions won't generalize,

but if you train a network with the right potentially hacky series of rewards/rich enough environment you could get a much more general intelligence

a new kind of science

[+] dannyw|4 years ago|reply

Alternate title: DeepMind fails to make progress on AGI, publishes thought piece instead.

[+] woeirua|4 years ago|reply

If RL is enough, then there is no physically realizable way to actually train a RL based GAI in the near future. RL based learning requires evaluating the outcome of millions or billions of scenarios over time in order to optimize the network.

Given that requirement you'd have to either find a way to accurately model the world and all of those interactions in silicon, or you'd have to build millions of robots that can report back the results of billions of interactions each day. It's not impossible to do that, and maybe it would even be likely that we would eventually accomplish that but the cost would make it prohibitive for anyone but a nation to even attempt today. It's almost certainly outside the realm of what is possible in the near future. Maybe when robotics has progressed enough that robots are capable of interacting with the world with basic AI will we see the rise of something like a GAI.

[+] adipandas|4 years ago|reply

This article is interesting, I even skimmed through their paper. But I think still the question remains: How to find the unified reward function? Or in other words, how to find answer to life? [It cannot be 42].

[+] qwertywert_|4 years ago|reply

Yea. For animals, reproduction and just surviving is the reward function?

It talks a lot about having a rich enough environment for learning which makes sense, if a computer lives only in a Go board it can only learn go playing itself.

How do you simulate a rich enough environment purely in software (or do you sense input from the "real" environment) and what reward do we define in this complex environment.. It seems to ask those 2 questions in the discussion but kind of glosses over them imo.

[+] goldenkey|4 years ago|reply

Intelligence would be produced in any Turing complete automata. But the universe has a frame rate of 10^34 (based on Planck constant.) We don't really have the tech to just run "evolution" of a universe or of even a psuedo biological substrate.

[+] bgroat|4 years ago|reply

I'm not a neuroscientist, an AI specialist, or a hardware engineer.

But as an enthusiast of all three I really think that AGI is a hardware problem, not a software problem.

Reinforcement learning on a massive corpus of data is how we train all biological intelligence.

The crazy thing is that in humans we manage to do it on ~3 watts an hour.

I think we have the software cracked, my gut thinks silicon just isn't the right material

[+] ramraj07|4 years ago|reply

Wait isn’t any Turing complete programming language sufficient to eventually reach general AI

[+] andyxor|4 years ago|reply

good luck with that. DeepMind should sponsor B. F. Skinner award, to honor the father of their behaviorist theories of 'reward and punishment' as a sort of all-encompassing theory of everything related to cognition. At least now they are torturing GPUs and not some poor lab animals.

on a serious note the only positive outcome of all this shameless PR is that the heavy investment in ML/RL might trickle down to actual science labs and fundamental neuroscience research which might move us forward towards understanding natural intelligence, a prerequisite for creating an artificial one.

[+] eeegnu|4 years ago|reply

> towards understanding natural intelligence, a prerequisite for creating an artificial one.

I've thought about this before, and I'm not convinced it's really prerequisite. Naturally developed intelligence in my mind may actually be highly constrained and inefficient because it was limited to what was biologically feasible. i.e. There may be simpler ways of achieving comparable results. Natural intelligence does however have the benefit of being an actual working model, but deciphering the blackbox may be just as hard as developing a working theory from first principles.

[+] 21eleven|4 years ago|reply

RL can provide amazing results (AlphaGo, AlphaStar (Starcraft 2 agent), etc) but it requires a well modeled world to work with.

Games like Go and Starcraft are well modeled worlds. If you want something akin to AGI to operate in the "real world" you will need a high quality data model of the real world for the RL system to work off of.

[+] sam-2727|4 years ago|reply

By far not an expert in this area, but if (when?) we successfully generate true intelligence, I suspect it will be through some sort of "ensemble" model, where multiple agents are trained in parallel and interact with each other. Intelligence as we know it hasn't just resulted from an evolution of one agent in response to a cost function, but rather through the complex interactions of agents (humans and organisms in general) over time. I feel like the underlying journal article (https://www.sciencedirect.com/science/article/pii/S000437022...) is missing discussion of this.

[+] throwaway879|4 years ago|reply

If you send a message in a bottle it will eventually land ashore somewhere, maybe in a century, who knows, and who knows whether it will be relevant by then or not, or civilization may not even exist by then, but sure it's similarly plausible to get to AGI via RL.

[+] visarga|4 years ago|reply

> send a message in a bottle

Just one - yes. But how about if you send millions of bottle messages?

[+] im3w1l|4 years ago|reply

So regarding objective function, one idea I just had is this: Teach them warfare.

To quote a cliche: "we live in a society". As humans we are embedded in a social environment which has a few important features: We cooperate, we compete and we die. These three pillars are the basis of our culture (a concept we should apply to AI btw). Because of competition we are forced to learn everything there is to learn (general intelligence), to get a leg up. Because of cooperation and death we need to continuously transmit and share knowledge with our friends and the next generations. Ever changing alliances means we need to get good at both deception and detecting it.

For this reason I think warfare is ideal for reaching general AI.

299 comments