top | item 42568680

(no title)

This is very interesting, but a couple of things to note; 1. o1 still achieves > 40% on the varied Putnam problems, which is still a feat most math students would not achieve. 2. o3 solved 25% of the Epoch AI dataset. - There was an interesting post which calls into question how difficult some of those problems actually are, but it still seems very impressive.

I think a fair conclusion here is reasoning models are still really good at solving very difficult math and competitive programming problems, but just better at ones they have seen before.

discuss

empath75|1 year ago

The comments in this thread are completely disconnected from the contents of the paper, and the thread title is rage bait and doesn't reflect the contents of the paper, either. Being able to solve a significant fraction of those problems is a pretty amazing achievement, even if it's sometimes tricked by minor variations. People are throwing around words like "fraud" or "hoax", and it's just wishcasting or whistling past the graveyard.