top | item 35470462

(no title)

Language models are based on probabilities of tokens appearing in a context. For illustration purposes, imagine a very simple model with just one token of context that has been trained on a corpus of three sentences, all of which are true, for example:

    clouds are white
    crows are black
    swans are white

After the model outputs "crows are", the single token of context is "are", and the probabilities are 2/3 for "white" and 1/3 for "black". So the model usually emits "crows are white", which is false, despite being trained on a corpus of true statements. Statistically "white" was more likely to follow "are" in the training data, so the same is the case of the model's output.

Of course LLMs have a much larger and more complex context than the single token in my example. But if the training data contains many news stories about professors being accused of sexual misconduct (which is newsworthy), and few news stories about professors behaving with propriety (which is not), then when querying the model for a story about a professor then it is likely to reproduce the statistical properties of its training data.

discuss

robocat|2 years ago

Nitpick: looking out my window, clouds are grey. If I drive to the estuary, the swans are black (most are in New Zealand). Black & white examples always turn out to be grey examples.