top | item 47067058

(no title)

algorithmsRcool | 11 days ago

I understand this is an attack, but I find myself mildly concerned that the model is "aware" enough to behave differently in the assumed context of a alignment test. Isn't this an inherent thread of dishonesty?

discuss

order