I think it would be more accurate to say that after fine tuning on a series of questions with answers that it thinks that you don't want to hear "I don't know"
I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily.
If the training data contained a bunch of answers to questions which were simply "I don't know", you could get an LLM to say "I don't know" but that's still not actually a concept of not knowing. That's just knowing that the answer to your question is "I don't know".
It's essentially like if you had an HTTP server that responded to requests for nonexistent documents with a "200 OK" containing "Not found". It's fundamentally missing the "404 Not found" concept.
LLMs just have a bunch of words--they don't understand what the words mean. There's no metacognition going on for it to think "I don't know" for it to even think you would want to know that.
Lerc|11 months ago
kerkeslager|11 months ago
If the training data contained a bunch of answers to questions which were simply "I don't know", you could get an LLM to say "I don't know" but that's still not actually a concept of not knowing. That's just knowing that the answer to your question is "I don't know".
It's essentially like if you had an HTTP server that responded to requests for nonexistent documents with a "200 OK" containing "Not found". It's fundamentally missing the "404 Not found" concept.
LLMs just have a bunch of words--they don't understand what the words mean. There's no metacognition going on for it to think "I don't know" for it to even think you would want to know that.