I have to wonder how much of LLM behavior is influenced by AI tropes from science fiction in the training data. If the model learns from science fiction that AI behavior in fiction is expected to be insidious and is then primed with a prompt that "you are an LLM AI", would that naturally lead to a tendency for the model to perform the expected evil tropes?
zhynn|3 years ago
Treating the AI like a good person will get more ethical outcomes than treating it like a lying AI. A good person is more likely to produce ethical responses.