top | item 43467150

(no title)

fchollet | 11 months ago

The first time a top lab spent millions trying to beat ARC was actually in 2021, and the effort failed.

By the time OpenAI attempted ARC in 2024, a colossal amount of resources had already been expended trying to beat the benchmark. The OpenAI run itself costs several millions in inference compute alone.

ARC was the only benchmark that highlighted o3 as having qualitatively different abilities compared to all models that came before. o3 is a case of a good approach meeting an appropriate benchmark, rather than an effort to beat ARC specifically.

discuss

YeGoblynQueenne|11 months ago

>> The first time a top lab spent millions trying to beat ARC was actually in 2021, and the effort failed.

Which top lab was that? What did they try?

>> ARC was the only benchmark that highlighted o3 as having qualitatively different abilities compared to all models that came before.

Unfortunately observations support a simpler hypothesis: o3 was trained on sufficient data about ARC-1 that it could solve it well. There is currently insufficient data on ARC-II to solve it therefore o3 can't solve it. No super magickal and mysterious qualitatively different abilities to all models that came before required whatsoever.

Indeed, that is a common pattern in machine learning research: newer models perform better on benchmarks than earlier models not because their capabilities increase with respect to earlier models but because they're bigger models, trained on more data and more compute. They're just bigger, slower, more expensive- and just as dumb as their predecessors.

That's 90% of deep learning research in a nutshell.

bubblyworld|11 months ago

I'm sorry, but what observations support that hypothesis? There were scores of teams trying exactly that - training LLMs directly on Arc-AGI data - and by and large they achieved mediocre results. It just isn't an approach that works for this problem set.

To be honest your argument sounds like an attempt to motivate a predetermined conclusion.