Incredibly impressive. Still can't really shake the feeling that this is o3 gaming the system more than it is actually being able to reason. If the reasoning capabilities are there, there should be no reason why it achieves 90% on one version and 30% on the next. If a human maintains the same performance across the two versions, an AI with reason should too.
kmacdough|1 year ago
If you look at the ARC tasks failed by o3, they're really not well suited to humans. They lack the living context humans thrive on, and have relatively simple, analytical outcomes that are readily processed by simple structures. We're unlikely to see AI as "smart" until it can be asked to accomplish useful units of productive professional work at a "seasoned apprentice" level. Right now they're consuming ungodly amounts of power just to pass some irritating, sterile SAT questions. Train a human for a few hours a day over a couple weeks and they'll ace this no problem.
tintor|1 year ago
It works the same with humans. If they spend more time on the puzzle they are more likely to solve it.
cornholio|1 year ago
While beyond current motels, that would be the final test of AGI capability.
jprete|1 year ago
xanderlewis|1 year ago
intended|1 year ago
FartyMcFarter|1 year ago
jprete|1 year ago
fc417fc802|1 year ago
[deleted]
demirbey05|1 year ago
ozten|1 year ago
sgt101|1 year ago
GaggiX|1 year ago
vectorhacker|1 year ago
pkphilip|1 year ago
dlubarov|1 year ago
HeatrayEnjoyer|1 year ago
pkphilip|1 year ago
If you disagree with me, state why instead of opting to downvote me