(no title)
versteegen | 22 days ago
Not very accurate. For each of ARC-AGI-1 and ARC-AGI-2 there is training set and three eval sets: public, semi-private, and private. The ARC foundation runs frontier LLMs on the semi-private set, and the labs give them pre-release API access so they can report release-day evals. They mostly don't allow anyone else to access the semi-private set (except for live Kaggle leaderboards which use it), so you see independent researchers report on the public eval set instead, often very dubious. The private is for Kaggle competitions only, no frontier LLMs evals are possible.
(ARC-AGI-1 results are now largely useless because most of its eval tasks became the ARC-2 training set. However some labs have said they don't train LLMs on the training sets anyway.)
No comments yet.