top | item 43343150 (no title) nathanasmith | 11 months ago Unfortunately that wouldn't help as much as you think since talented AI labs can just watch the public leaderboard and note what models move up and down to deduce and target whatever the hidden benchmark is testing. discuss order hn newest No comments yet.
No comments yet.