top | item 46933061

(no title)

versteegen | 22 days ago

> The only valid ARC AGI results are from tests done by the ARC AGI non-profit using an unreleased private set. I believe lab-conducted ARC AGI tests must be on public sets and taken on a 'scout's honor' basis that the lab self-administered the test correctly

Not very accurate. For each of ARC-AGI-1 and ARC-AGI-2 there is training set and three eval sets: public, semi-private, and private. The ARC foundation runs frontier LLMs on the semi-private set, and the labs give them pre-release API access so they can report release-day evals. They mostly don't allow anyone else to access the semi-private set (except for live Kaggle leaderboards which use it), so you see independent researchers report on the public eval set instead, often very dubious. The private is for Kaggle competitions only, no frontier LLMs evals are possible.

(ARC-AGI-1 results are now largely useless because most of its eval tasks became the ARC-2 training set. However some labs have said they don't train LLMs on the training sets anyway.)

discuss

order

No comments yet.