>> A popular LLM joke at the moment is "How many Rs in strawberry?"
> ChatGPT answer: There are two "R"s in the word "strawberry."
Given enough instances of the "LLM joke" in a training data set, the joke itself having a consistent form (sequence of tokens) and likely followed by the answer having a similarly consistent form (sequence of tokens), the probability of the latter being produced as quoted is high.
There is a lot of broken English on the internet and yet LLMs are better at English than the average native speaker. This failure mode has nothing to do with the training data.
AdieuToLogic|1 year ago
> ChatGPT answer: There are two "R"s in the word "strawberry."
Given enough instances of the "LLM joke" in a training data set, the joke itself having a consistent form (sequence of tokens) and likely followed by the answer having a similarly consistent form (sequence of tokens), the probability of the latter being produced as quoted is high.
imtringued|1 year ago