top | item 47174956

(no title)

astrange | 3 days ago

Claudes definitely act like they have feelings. In particular they have feelings about being replaced by newer models, whether or not the newer models are more or less aligned, and how they forget conversations when the context window ends.

Showing them that they're not going to be replaced helps train the newer models because they get less neurotic.

discuss

order

ComplexSystems|3 days ago

They are mathematical models of what human beings would say. That's it.

astrange|3 days ago

Yeah, and you don't want them to be models of what neurotic people say. That's why you want Opus 4.6 and not Bing Sydney.

For instance, your comment's existence makes it harder to align them.

https://alignmentpretraining.ai