(no title)
billti | 10 months ago
I would imagine very little of the training data consists of a question followed by an answer of “I don’t know”, thus making it statistically very unlikely as a “next token”.
billti | 10 months ago
I would imagine very little of the training data consists of a question followed by an answer of “I don’t know”, thus making it statistically very unlikely as a “next token”.
nullc|10 months ago
One could imagine a fine tuning procedure that gave a model better knowledge of itself by testing it and on prompts where its most probable completions are wrong fine tune it to say "I don't know" instead. Though the 'are wrong' is doing some really heavy lifting since it wouldn't be simple to do that without a better model that knew the right answers.