top | item 45699702

(no title)

kromem | 4 months ago

With ChatGPT the memory feature, particularly in combination with RLHF sampling from user chats with memory, led to an amplification problem which in that case amplified sycophancy.

In Anthropic's case, it's probably also going to lead to an amplification problem, but due to the amount of overcorrection for sycophancy I suspect it's going to amplify more of a aggressiveness and paranoia towards the user (which we've already started to see with the 4.5 models due to the amount of adversarial training).

discuss

No comments yet.