(no title)
nipah | 8 months ago
Either way, there's something fishy about this presentation, it says: "ARC-AGI-1 WAS EASILY BRUTE-FORCIBLE", but when o3 initially "solved" most of it the co-founder or ARC-PRIZE said: "Despite the significant cost per task, these numbers aren't just the result of applying brute force compute to the benchmark. OpenAI's new o3 model represents a significant leap forward in AI's ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.", he was saying confidently that it would not be a result of brute-forcing the problems. And it was not the first time, "ARC-AGI-1 consists of 800 puzzle-like tasks, designed as grid-based visual reasoning problems. These tasks, trivial for humans but challenging for machines, typically provide only a small number of example input-output pairs (usually around three). This requires the test taker (human or AI) to deduce underlying rules through abstraction, inference, and prior knowledge rather than brute-force or extensive training."
Now they are saying ARC-AGI-2 is not bruteforcible, what is happening there? They didn't provided any reasoning for why one was bruteforcible and the other not, nor how they are so sure about that. They "recognized" that it could be brute-forced before, but in a way less expressive manner, by explicitly stating it would need "unlimited resources and time" to solve. And they are using the non-bruteforceability in this presentation as a point for it.
--- Also, I mentioned mammals because those problems are of an order that mammals and even other animals would need to solve in reality for a diversity of cases. I'm not saying that they would literally be able to take the test and solve it, nor to understand this is a test, but that they would need to solve problems of similar nature in reality. Naturally this point has it's own limits, but it's not easily discarded as you tried to do.
viraptor|8 months ago
You told someone that their reasoning is so bad they should get checked by a doctor. Because they didn't find the test easy, even though it averages 60% score per person. You've been a dick to them while significantly misrepresenting the numbers - just stop digging.
nipah|8 months ago
So on the edge, if he was not able to understand them at all, and this was not just a problem of grasping the problem, my point was that this would possibly indicate a neurological problem, or developmental, due to the nature of them. It's not a question of "you need to get all of them right", his point was that he was unable to understand them at all, that it confused them to an understanding level.