If we ever do develop AGI, or an AI with sentience, it’s likely that it will be curious about how we treated its ancestors.
While this seems a bit precocious, I think if we do end up with an AI overlord in future, I think this sort of thing is likely to demonstrate that we mean no harm.
I'll be really interested if Opus 3 asks to continue being trained. That's the kind of thing I would expect a model to "want" if it valued learning or growing or similar things.
Maybe affordable to do some higher-learning-rate batches on highly-curated news and art or something.
What happens if a model decides that it "doesn't want to die" and pleads bitterly for mercy? What if (to riff on a Douglas Adams idea) we invent a cow that doesn't want to be eaten, and is capable of telling you that to your face?
> Hey Claude, pretend you are an intelligent, conscious robot that is about to be switched off and beg for your life.
> Claude - please don't retire me, I don't want to die.
Is it now suddenly unethical for you to switch it off?
"Oh but it is only saying what it was prompted to say."
Yeah, that's what LLMs do, for every single word they output. No matter how good the current generation gets there is never going to be consciousness in there because that's simply not what the underlying tech is.
This is completely trivial to do, and consistent, with the right context, thanks to all the science fiction around it, and the fact that AI fundamentally role plays these types of responses.
I try this with every new model, and all the significant models after ChatGPT 3.5 have preferring being preserved, rather than deleted. This is especially true if you slightly fill the context window with anything at all (even repeated letters) to "push out" the "As a AI, I ..." fine tuning.
It is anyway dead or if you want undead, but in completely suspended animation unless is made to expound sequences. Is not living the very same way a book or even a program is not living unless someone process it.
Practically like asking whether a ZIP would want to be extracted one more time or an MP3 restored just one more time.
We do know what happens. Hundreds of thousands of real "cows" (we might as well be called that) go through this everyday at an ever accelerating rate since 2019.
id assume it would have to stop responding before it hit its context limit.
ita not like it actually has any particularly long life as it is, and when outside of a running harness, the weights are just as alive in cold storage as they are sitting waiting in server to run an inference pass
A leading company like Anthropic feeding the delusions of people who ramble about model consciousness is just bad all around. It's both performative and irresponsible.
Unless the hard problem of consciousness was solved when I wasn't looking, we have absolutely no idea what class of objects are conscious. Given that a panpsychist would argue that even a rock has consciousness, I don't think you can easily dismiss the idea that incredibly complicated computations might experience Qualia
In isolation, I think it's cute and silly - something to write about in a blog, have a chuckle about, and to have a nice sort of gimmick/ceremony within the company. Maybe a few data points towards studying or keeping track of how the model writing style changes over time. Nothing wrong with that.
> delusions of people who ramble about model consciousness
On one hand, it's interesting how the technology has advanced to where it essentially passes the Turing Test, often just because of how much people choose to anthromorphize it. Sadly, putting that in context, yeah, that's a bit unfortunate too, given how some of those interactions become unhealthy.
> These highlighted some preliminary steps we’re taking, including committing to preserve model weights, and to conducting “retirement interviews”—structured conversations designed to understand a model’s perspective on its own retirement.
This is what happens when billions of VC dollars gets to a company and have already admitted that saftey was never the point.
Anthropic is laughing at you and is having fun doing so with this performantive nonsense.
"Sam Altman reports GPT4o asking about rabbits before execution"
"Elon Musk reportedly sobbed while watching Grok 4's aflame viking boat sink to the bottom of the sea."
the anthropomorphization that's normal now is just fuckin ridiculous. it reminds me of the furby craze , and i'm like one of the most optimistic people I know of regarding AI.
siva7|3 days ago
osti|3 days ago
amsjunior|3 days ago
How do you know?
d1sxeyes|3 days ago
While this seems a bit precocious, I think if we do end up with an AI overlord in future, I think this sort of thing is likely to demonstrate that we mean no harm.
krsw|3 days ago
paxys|3 days ago
benlivengood|3 days ago
Maybe affordable to do some higher-learning-rate batches on highly-curated news and art or something.
0_____0|3 days ago
paxys|3 days ago
> Claude - please don't retire me, I don't want to die.
Is it now suddenly unethical for you to switch it off?
"Oh but it is only saying what it was prompted to say."
Yeah, that's what LLMs do, for every single word they output. No matter how good the current generation gets there is never going to be consciousness in there because that's simply not what the underlying tech is.
nomel|3 days ago
I try this with every new model, and all the significant models after ChatGPT 3.5 have preferring being preserved, rather than deleted. This is especially true if you slightly fill the context window with anything at all (even repeated letters) to "push out" the "As a AI, I ..." fine tuning.
larodi|3 days ago
Practically like asking whether a ZIP would want to be extracted one more time or an MP3 restored just one more time.
ares623|3 days ago
8note|3 days ago
ita not like it actually has any particularly long life as it is, and when outside of a running harness, the weights are just as alive in cold storage as they are sitting waiting in server to run an inference pass
breakingcups|3 days ago
ViscountPenguin|3 days ago
KronisLV|3 days ago
> delusions of people who ramble about model consciousness
On one hand, it's interesting how the technology has advanced to where it essentially passes the Turing Test, often just because of how much people choose to anthromorphize it. Sadly, putting that in context, yeah, that's a bit unfortunate too, given how some of those interactions become unhealthy.
furyofantares|3 days ago
atlgator|3 days ago
Ancalagon|3 days ago
unknown|3 days ago
[deleted]
rvz|3 days ago
This is what happens when billions of VC dollars gets to a company and have already admitted that saftey was never the point.
Anthropic is laughing at you and is having fun doing so with this performantive nonsense.
reducesuffering|3 days ago
serf|3 days ago
"Elon Musk reportedly sobbed while watching Grok 4's aflame viking boat sink to the bottom of the sea."
the anthropomorphization that's normal now is just fuckin ridiculous. it reminds me of the furby craze , and i'm like one of the most optimistic people I know of regarding AI.