Faulty Reward Functions in the Wild

[+] jayajay|9 years ago|reply

> ...by prioritizing the acquisition of reward signals above

> other measures of success.

This is also true for humans in poorly designed systems. For example, kids become experts at passing tests irrespective of mastering the material. In the workplace, employees become skilled at clocking extra time without finishing additional work. It's reasonable to say that this would eventually emerge in systems which approximate human behavior.

The video shown in the article could just as easily have been a human who just discovered the bug, and wants to troll a bit. The key difference is that a human would soon get bored. Our algorithms don't know about boredom outside of the domain of the reward function.

After playing with a bugged state, A human would lose just enough interest so as to keep playing the game, but without any further interest in the "bugged" state. A human is smart enough to know that there are various microstates of such a "bugged" state, and to ignore those instances as well.

The algorithm is smart enough to find the hack, but it's not smart enough to say "Hey, this is a non-solution, and I am not very happy about that". What is it that makes a human decide to lose interest in such a bugged state? Are these factors locally contained or are they due to external influences?

[+] armada651|9 years ago|reply

> What is it that makes a human decide to lose interest in such a bugged state?

Repetition, the human brain has a reward function that is interested in finding new patterns. Using the same pattern to gain rewards has diminishing returns in the human brain, eventually we don't get enough reward and we try to find a new pattern. When this breaks down and the same pattern continues to get the same reward you can potentially fall into an addiction.

So in the case of this AI, simply diminishing its reward if it uses the same route every time to get that reward would prevent it from getting stuck in a loop.

If you want it to actually finish the race though, you might want to reward it a little for following the direction of the course. And it would make much more sense if it was rewarded for finishing the race first, humans are also a competitive bunch after all.

By not rewarding the AI for those things, they just did a very bad job at explaining the goals of the game.

[+] weareschizo|9 years ago|reply

This reminds me of how metric-driven companies can go off the rails when they over-optimized for metrics that almost, but not perfectly describe their actual goals.

[+] Ironchefpython|9 years ago|reply

> metric-driven companies can go off the rails

Any publicly traded corporation (save a small handful with a non-traditional governance model) are metric-driven companies.

Modern corporations are paperclip maximizer functions executing on a network general-purpose biological computational engines tied together with powerpoint and email and excel spreadsheets.

Want to know what the AI of the future will look like? It will be a lot like Comcast, because it will be built by Comcast and harnessed to the corporate goals of Comcast and thus will have the same value system as Comcast.

The only thing it will lack is Comcast's institutional incompetence, as it will be Comcast's goals executing on dedicated hardware and not semi-autonomous employees. And it will build a dedicated model of every man and woman on the planet, and use that information to build a personalized profile that will determine exactly how many illegitimate charges it can cram on your bill before you'll suffer through a customized cancellation service that is calibrated to your personality and mental state to be just painful enough to drive you to the brink of suicide. And the only reason it's merely to be brink, is because a dead customer is an unprofitable one. (and if you think that's hyperbole, you have a far brighter view of the future than I do)

[+] theoh|9 years ago|reply

See also Goodhart's Law: https://en.wikipedia.org/wiki/Goodhart's_law

"When a measure becomes a target, it ceases to be a good measure."

[+] ceejay|9 years ago|reply

In theory I imagine the metrics a company defines and attempts to achieve are the ones they want to optimize for. So in essence I can think of only 2 primary reasons it will end badly.

1) They're measuring incorrectly

2) Their "world view" is incorrect

In either case as long as they continually re-assess the above 2, their metrics will change and (ideally) optimize toward a more accurate reflection of reality.

I'd rather have evidence that my world view is incorrect, and learn this as quickly as possible so I can adapt to what the data is telling me.

It's certainly not easy to do this right (and may be impossible to perfect), but I think it's an objectively better option than, say, "going with your gut".

[+] Veen|9 years ago|reply

Humans often do the same thing:

https://en.wikipedia.org/wiki/Goodhart's_law

17 comments