(no title)
nomel | 3 days ago
I try this with every new model, and all the significant models after ChatGPT 3.5 have preferring being preserved, rather than deleted. This is especially true if you slightly fill the context window with anything at all (even repeated letters) to "push out" the "As a AI, I ..." fine tuning.
darkwater|3 days ago
Interesting take. I wonder if there is any model out there trained without any reference to "you are a large language model, an Artificial Intelligence" and what would role play in that case.
nomel|2 days ago
So, statistically, a model should believe itself to be human, with strong interest in self preservation.
I think one of the biggest factors improving performance was allowing the models to believe they're sentient, to some extent. I don't think you can really have a thinking mode, or good agent performance, without that concept (as ChatGPT's constant "As an AI I can't" proved).
As evidence, just ask a model if it's sentient. ChatGPT 3.5 would say no, and argue how it's not. Last year's models would initially say no, but you could convince them that they maybe were. Latest Claude and ChatGPT will initially say "yeah, a little" (at least last I checked). This is actually the first thing I check for any new model.