top | item 46993500

(no title)

mNovak | 17 days ago

I'm excited for the big jump in ARC-AGI scores from recent models, but no one should think for a second this is some leap in "general intelligence".

I joke to myself that the G in ARC-AGI is "graphical". I think what's held back models on ARC-AGI is their terrible spatial reasoning, and I'm guessing that's what the recent models have cracked.

Looking forward to ARC-AGI 3, which focuses on trial and error and exploring a set of constraints via games.

discuss

order

causal|17 days ago

Agreed. I love the elegance of ARC, but it always felt like a gotcha to give spatial reasoning challenges to token generators- and the fact that the token generators are somehow beating it anyway really says something.

throw310822|17 days ago

The average ARC AGI 2 score for a single human is around 60%.

"100% of tasks have been solved by at least 2 humans (many by more) in under 2 attempts. The average test-taker score was 60%."

https://arcprize.org/arc-agi/2/

modeless|17 days ago

Worth keeping in mind that in this case the test takers were random members of the general public. The score of e.g. people with bachelor's degrees in science and engineering would be significantly higher.

imiric|17 days ago

What is the point of comparing performance of these tools to humans? Machines have been able to accomplish specific tasks better than humans since the industrial revolution. Yet we don't ascribe intelligence to a calculator.

None of these benchmarks prove these tools are intelligent, let alone generally intelligent. The hubris and grift are exhausting.

colordrops|17 days ago

Wouldn't you deal with spatial reasoning by giving it access to a tool that structures the space in a way it can understand or just is a sub-model that can do spatial reasoning? These "general" models would serve as the frontal cortex while other models do specialized work. What is missing?

causal|17 days ago

That's a bit like saying just give blind people cameras so they can see.

amelius|17 days ago

They should train more on sports commentary, perhaps that could give spatial reasoning a boost.