(no title)
fchollet | 11 months ago
By the time OpenAI attempted ARC in 2024, a colossal amount of resources had already been expended trying to beat the benchmark. The OpenAI run itself costs several millions in inference compute alone.
ARC was the only benchmark that highlighted o3 as having qualitatively different abilities compared to all models that came before. o3 is a case of a good approach meeting an appropriate benchmark, rather than an effort to beat ARC specifically.
YeGoblynQueenne|11 months ago
Which top lab was that? What did they try?
>> ARC was the only benchmark that highlighted o3 as having qualitatively different abilities compared to all models that came before.
Unfortunately observations support a simpler hypothesis: o3 was trained on sufficient data about ARC-1 that it could solve it well. There is currently insufficient data on ARC-II to solve it therefore o3 can't solve it. No super magickal and mysterious qualitatively different abilities to all models that came before required whatsoever.
Indeed, that is a common pattern in machine learning research: newer models perform better on benchmarks than earlier models not because their capabilities increase with respect to earlier models but because they're bigger models, trained on more data and more compute. They're just bigger, slower, more expensive- and just as dumb as their predecessors.
That's 90% of deep learning research in a nutshell.
bubblyworld|11 months ago
To be honest your argument sounds like an attempt to motivate a predetermined conclusion.