top | item 42565746

(no title)

obblekk | 1 year ago

I hope someone reruns this on o1 and eventually o3.

If o1-preview was the start like gpt1, then we should expect generalization to increase quickly.

discuss

I don't think llm generalise much, that's why they're not creative and can't solve novel problems. It's pattern matching with a huge amount of data.

Study on the topic: https://arxiv.org/html/2406.15992v1

This would explain o1 poor performance with problems with variations. o3 seems to be expensive brute forcing in latent space followed by verification which should yield better results - but I don't think we can call it generalisation.

I think we need to go back to the drawing board.

UniverseHacker|1 year ago

From firsthand experience, this simply cannot be true. I can give them totally novel and unique physics problems I just made up- that requires tracking the movement of objects through a series of events, and it answers most correctly. Moreover, they find analogies between disparate concepts and fields of study and make useful suggestions based on them- which is arguably the same process as human creativity.

I think ultimately the disconnect is people theorizing about what it can or cannot do with an incorrect mental model of what it is, and then assuming it cannot do things that it can in fact do. The irony of discussions on LLMs is they more showcase the limits of humans ability to reason about novel situations.

red75prime|1 year ago

Don't worry, there are thousands of researchers at the drawing boards right now.

s1mplicissimus|1 year ago

the fact that this (and tons of other legitimate critique) got downvoted into greytext speaks so much louder to me than all benchmarks in the world

mupuff1234|1 year ago

You're assuming that openAI isn't just gonna add the new questions to the training data.

Lerc|1 year ago

Their methodology shows they can create an infinite variety of problems.

This is the same thing as synthetic training data.

It doesn't matter if models are trained on the output of the generated data or not. If the model ends up being able to solve newly generated variations, you'd have to admit that it understands the underlying problems.

sirolimus|1 year ago

Exactly. The naivity is just sky-high