top | item 45606154

(no title)

One issue right now is that in a lot of ML benchmarks models get rewarded for guessing multiple choice questions due to the probability of being right. In addition to that, people have tuned models via RLHF to be very confident because people think confident responses sound good. These two paired together resembles bluffing because models will guess at answers very confidently rather than saying "I don't know".

discuss

No comments yet.