top | item 42562776

(no title)

Given how o3 cracked the arc bench and I'm probably sounding like a broken record, this isn't as farfetched as some of you may think it is. ML models will very likely continue to scale regardless of how many bets are placed against it. I'm not sure why a lot of people aren't concerned about arc bench being cracked so fast. Our grand delusions of specialness has been shown to just that, delusions

"Humanity is a just a small step in the giant staircase of intelligence" - Geoffrey Hinton

discuss

crackrook|1 year ago

I have no clue if AGI will look anything like today's LLMs but I don't think the information we have about o3 so far suggests that it's particularly earth shaking or even a significant step towards AGI.

From the ARC announcement: "a large ensemble of low-compute Kaggle solutions can now score 81% on the private eval." If I understand this correctly, o3's performance is not a grand leap beyond the capabilities of many times cheaper models with similarly privileged information. The ARC news seems more likely to be evidence that the benchmark needs tweaking than proof that scaling works (although OpenAI's marketing team would like us very much to interpret it as the latter).

There has also been a bit of imprecision and hand waving around other benchmarks that bolsters my skepticism. For instance the Codeforces benchmark results were touted with no meaningful description of the methodology and what little we do know suggests (to me, at least) that comparing o3's elo to that of a human is an apples to oranges comparison: https://codeforces.com/blog/entry/137539

ottaborra|1 year ago

I don't understand. If kaggle solutions were able to do those, what the hell do these mean?

https://arcprize.org/2024-results

beretguy|1 year ago

What's arc bench?

CliveBloomers|1 year ago

[deleted]