top | item 45556111

(no title)

Thanks for sharing. I did not know this law existed and had a name. I know nothing about nothing but it appears to be the case that the interpretation of metrics for policies assume implicitly the "shape" of the domain. E.g. in RL for games we see a bunch of outlier behavior for policies just gaming the signal.

There seems to be 2 types

- Specification failure: signal is bad-ish, a completely broken behavior --> local optimal points achieved for policies that phenomenologically do not represent what was expected/desired to cover --> signaling an improvable reward signal definition

- Domain constraint failure: signal is still good and optimization is "legitimate", but you are prompted with the question "do I need to constraint my domain of solutions?"

  - finding a bug that reduces time to completion of a game in a speedrun setting would be a new acceptable baseline, because there are no rules to finishing the game earlier
  
  - shooting amphetamines on a 100m run would probably minimize time, but other factors will make people consider disallowing such practices.

discuss

Eisenstein|4 months ago

I view Goodhart's law more as a lesson for why we can never achieve a goal by offering specific incentives if we are measuring success by the outcome of the incentives and not by the achievement of the goal.

This is of course inevitable if the goal cannot be directly measured but is composed of many constantly moving variables such as education or public health.

This doesn't mean we shouldn't bother having such goals, it just means we have to be diligent at pivoting the incentives when it becomes evident that secondary effects are being produced at the expense of the desired effect.

godelski|4 months ago

  > This is of course inevitable if the goal cannot be directly measured

It's worth noting that no goal can be directly measured[0].

I agree with you, this doesn't mean we shouldn't bother with goals. They are fantastic tools. But they are guides. The better aligned our proxy measurement is with the intended measurement then the less we have to interpret our results. We have to think less, spending less energy. But even poorly defined goals can be helpful, as they get refined as we progress in them. We've all done this since we were kids and we do this to this day. All long term goals are updated as we progress in them. It's not like we just state a goal and then hop on the railroad to success.

It's like writing tests for code. Tests don't prove that your code is bug free (can't write a test for a bug you don't know about: unknown unknown). But tests are still helpful because they help evidence the code is bug free and constrain the domain in which bugs can live. It's also why TDD is naive, because tests aren't proof and you have to continue to think beyond the tests.

[0] https://news.ycombinator.com/item?id=45555551