top | item 47067058 (no title) algorithmsRcool | 11 days ago I understand this is an attack, but I find myself mildly concerned that the model is "aware" enough to behave differently in the assumed context of a alignment test. Isn't this an inherent thread of dishonesty? discuss order hn newest spkavanagh6|8 days ago Faking has been a thing too - https://www.anthropic.com/research/alignment-faking
spkavanagh6|8 days ago Faking has been a thing too - https://www.anthropic.com/research/alignment-faking
spkavanagh6|8 days ago