I have a thought that whilst LLM providers can say "Sorry" - there is little incentive and it will expose the reality that they are not very accurate, nor can be properly measured.
That said, there clearly are use cases where if the LLM can't a certain level of confidence it should refer to the user, rather than guessing.
Rudybega|4 months ago
E.g.
Most current benchmarks have a scoring scheme of
1 - Correct Answer 0 - No answer or incorrect answer
But what they need is something more like
1 - Correct Answer 0.25 - No answer 0 - Incorrect answer
You need benchmarks (particularly those used in training) to incentivize the models to acknowledge when they're uncertain.