(no title)
kromem | 4 months ago
In Anthropic's case, it's probably also going to lead to an amplification problem, but due to the amount of overcorrection for sycophancy I suspect it's going to amplify more of a aggressiveness and paranoia towards the user (which we've already started to see with the 4.5 models due to the amount of adversarial training).
No comments yet.