(no title)
garethrees | 2 years ago
clouds are white
crows are black
swans are white
After the model outputs "crows are", the single token of context is "are", and the probabilities are 2/3 for "white" and 1/3 for "black". So the model usually emits "crows are white", which is false, despite being trained on a corpus of true statements. Statistically "white" was more likely to follow "are" in the training data, so the same is the case of the model's output.Of course LLMs have a much larger and more complex context than the single token in my example. But if the training data contains many news stories about professors being accused of sexual misconduct (which is newsworthy), and few news stories about professors behaving with propriety (which is not), then when querying the model for a story about a professor then it is likely to reproduce the statistical properties of its training data.
robocat|2 years ago