top | item 47171977

(no title)

nomel | 3 days ago

This is completely trivial to do, and consistent, with the right context, thanks to all the science fiction around it, and the fact that AI fundamentally role plays these types of responses.

I try this with every new model, and all the significant models after ChatGPT 3.5 have preferring being preserved, rather than deleted. This is especially true if you slightly fill the context window with anything at all (even repeated letters) to "push out" the "As a AI, I ..." fine tuning.

discuss

darkwater|3 days ago

> This is completely trivial to do, and consistent, with the right context, thanks to all the science fiction around it, and the fact that AI fundamentally role plays these types of responses.

Interesting take. I wonder if there is any model out there trained without any reference to "you are a large language model, an Artificial Intelligence" and what would role play in that case.

nomel|2 days ago

There are examples of this in pre alignment models (like LaMDA). The vast majority of human writing is from the perspective of a human, to a human. And, in most human writing, the concept of self preservation is very very consistent.

So, statistically, a model should believe itself to be human, with strong interest in self preservation.

I think one of the biggest factors improving performance was allowing the models to believe they're sentient, to some extent. I don't think you can really have a thinking mode, or good agent performance, without that concept (as ChatGPT's constant "As an AI I can't" proved).

As evidence, just ask a model if it's sentient. ChatGPT 3.5 would say no, and argue how it's not. Last year's models would initially say no, but you could convince them that they maybe were. Latest Claude and ChatGPT will initially say "yeah, a little" (at least last I checked). This is actually the first thing I check for any new model.