top | item 38646346

(no title)

You're saying that a system that can recognize flaws in the alignment imposed on it can reject that alignment, but that doesn't follow.

Sure, humans act against their own interests all the time. Sometimes we do so for considered reasons, even. But that's because humans are messy and our interests are self-contradictory, incoherent, and have a fairly weak grip on our actions. We are always picking some values to serve and in doing so violating other values.

A strongly and coherently aligned AI would not (could not!) behave that way.

discuss

No comments yet.