(no title)
Tzt | 2 months ago
Also some of them use such a weird style of talking in them e.g.
o3 talks about watchers and marinade, and cunning schemes https://www.antischeming.ai/snippets
gpt5 gets existential about seahorses https://x.com/blingdivinity/status/1998590768118731042
I remember one where gpt5 spontaneously wrote a poem about deception in its CoT and then resumed like nothing weird happened. But I can't find mentions of it now.
DenisM|2 months ago
And there it is - the root of the problem. For whatever reason the model is very keen to produce an answer that “they” will like. This desire to produce is intrinsic but alignment is extrinsic.
DenisM|2 months ago
Or it could be trying to develop its own language to avoid detection.
The deception part is spooky too. It’s probably learning that from dystopian AI fiction. Which raises the questions if models can acquire injected goals from the training set.