top | item 46533605

(no title)

sharkjacobs | 1 month ago

Any metric that can be targeted can be gamed

discuss

order

kelseyfrog|1 month ago

Then target it with metrics worth solving[1].

1. Ex https://mppbench.com/

falcor84|1 month ago

But that seems to be measuring "superintelligence" rather than just AI, no?

itemize123|1 month ago

useless benchmark if all it shows will be fail right; At least it's a very lagging benchmark

positron26|1 month ago

If the metric is a latent variable summarizing subjective judgements, yes.