top | item 42560826

(no title)

x_may | 1 year ago

The LMSYS leaderboards are crowdsourced and would be hard to fake, it showing a pretty strong performance in terms of human preference.

discuss

paxys|1 year ago

Crowdsourced data is the easiest to fake unless you can somehow ensure that you have a completely unbiased population (which is impossible). There's a reason why certain models do so well on upvote-based leaderboards but rank nowhere on objective tests.

CGamesPlay|1 year ago

Which ones? I think fine-tunes are where I see most of this (I'll just call it) "model spam", but the base models don't seem to exhibit this behavior. I do see some models perform way below the curve compared to their benchmark performance, though (Phi family being the most famous).