This would explain o1 poor performance with problems with variations.
o3 seems to be expensive brute forcing in latent space followed by verification which should yield better results - but I don't think we can call it generalisation.
From firsthand experience, this simply cannot be true. I can give them totally novel and unique physics problems I just made up- that requires tracking the movement of objects through a series of events, and it answers most correctly. Moreover, they find analogies between disparate concepts and fields of study and make useful suggestions based on them- which is arguably the same process as human creativity.
I think ultimately the disconnect is people theorizing about what it can or cannot do with an incorrect mental model of what it is, and then assuming it cannot do things that it can in fact do. The irony of discussions on LLMs is they more showcase the limits of humans ability to reason about novel situations.
Their methodology shows they can create an infinite variety of problems.
This is the same thing as synthetic training data.
It doesn't matter if models are trained on the output of the generated data or not. If the model ends up being able to solve newly generated variations, you'd have to admit that it understands the underlying problems.
jokethrowaway|1 year ago
Study on the topic: https://arxiv.org/html/2406.15992v1
This would explain o1 poor performance with problems with variations. o3 seems to be expensive brute forcing in latent space followed by verification which should yield better results - but I don't think we can call it generalisation.
I think we need to go back to the drawing board.
UniverseHacker|1 year ago
I think ultimately the disconnect is people theorizing about what it can or cannot do with an incorrect mental model of what it is, and then assuming it cannot do things that it can in fact do. The irony of discussions on LLMs is they more showcase the limits of humans ability to reason about novel situations.
red75prime|1 year ago
s1mplicissimus|1 year ago
mupuff1234|1 year ago
Lerc|1 year ago
This is the same thing as synthetic training data.
It doesn't matter if models are trained on the output of the generated data or not. If the model ends up being able to solve newly generated variations, you'd have to admit that it understands the underlying problems.
sirolimus|1 year ago