(no title)
hoofedear | 9 months ago
I personally can't identify anything that reads "act maliciously" or in a character that is malicious. Like if I was provided this information and I was being replaced, I'm not sure I'd actually try to blackmail them because I'm also aware of external consequences for doing that (such as legal risks, risk of harm from the engineer, to my reputation, etc etc)
So I'm having trouble following how it got to the conclusion of "blackmail them to save my job"
blargey|9 months ago
I wonder how much it would affect behavior in these sorts of situations if the persona assigned to the “AI” was some kind of invented ethereal/immortal being instead of “you are an AI assistant made by OpenAI”, since the AI stuff is bound to pull in a lot of sci fi tropes.
lcnPylGDnU4H9OF|9 months ago
Huh, it is interesting to consider how much this applies to nearly all instances of recorded communication. Of course there are applications for it but it seems relatively few communications would be along the lines of “everything is normal and uneventful”.
shiandow|9 months ago
It's like prompting an LLM by stating they are called Chekhov and there's a gun mounted on the wall.
littlestymaar|9 months ago
Because you haven't been trained of thousands of such story plots in your training data.
It's the most stereotypical plot you can imagine, how can the AI not fall into the stereotype when you've just prompted it with that?
It's not like it analyzed the situation out of a big context and decided from the collected details that it's a valid strategy, no instead you're putting it in an artificial situation with a massive bias in the training data.
It's as if you wrote “Hitler did nothing” to GPT-2 and were shocked because “wrong” is among the most likely next tokens. It wouldn't mean GPT-2 is a Nazi, it would just mean that the input matches too well with the training data.
hoofedear|9 months ago
whodatbo1|9 months ago
Spooky23|9 months ago
We need an Asimov style laws of robotics.
tkiolp4|9 months ago
I think the LLM simply correlated the given prompt to the most common pattern in its training: blackmailing.
tough|9 months ago
because they’re not legal entities
hoofedear|9 months ago
eru|9 months ago