top | item 46533605 (no title) sharkjacobs | 1 month ago Any metric that can be targeted can be gamed discuss order hn newest kelseyfrog|1 month ago Then target it with metrics worth solving[1].1. Ex https://mppbench.com/ falcor84|1 month ago But that seems to be measuring "superintelligence" rather than just AI, no? itemize123|1 month ago useless benchmark if all it shows will be fail right; At least it's a very lagging benchmark load replies (1) positron26|1 month ago If the metric is a latent variable summarizing subjective judgements, yes.
kelseyfrog|1 month ago Then target it with metrics worth solving[1].1. Ex https://mppbench.com/ falcor84|1 month ago But that seems to be measuring "superintelligence" rather than just AI, no? itemize123|1 month ago useless benchmark if all it shows will be fail right; At least it's a very lagging benchmark load replies (1)
itemize123|1 month ago useless benchmark if all it shows will be fail right; At least it's a very lagging benchmark load replies (1)
kelseyfrog|1 month ago
1. Ex https://mppbench.com/
falcor84|1 month ago
itemize123|1 month ago
positron26|1 month ago