top | item 44521169

(no title)

Ocha | 7 months ago

Nobody believes Elon anymore.

discuss

order

fumblebee|7 months ago

Hm, impartial benchmarks are independent of Elon's claims?

ben_w|7 months ago

Impartial benchmarks are great, unless (1) you have so many to choose from that you can game them (which is still true even if the benchmark makers themselves are absolutely beyond reproach), or (2) there's a difference between what you're testing and what you care about.

Goodhart's Law means 2 is approximately always true.

As it happens, we also have a lot of AI benchmarks to choose from.

Unfortunately this means every model basically has a vibe score right now, as the real independent tests are rapidly saturated into the "ooh shiny" region of the graph. Even the people working on e.g. the ARC-AGI benchmark don't think their own test is the last word.

irthomasthomas|7 months ago

Likely they trained on test. Grok 3 had similarly remarkable benchmark scores but fell flat in real use.

bigyabai|7 months ago

"impartial" how? Do you have the training data, are you auditing to make sure they're not few-shotting the benchmarks?

DonHopkins|7 months ago

The latest independent benchmark results consistently output "HEIL HITLER!"