I don't think 2 is true: when OpenAI model won a gold medal in the math olympiads, it did so without tools or web search, just pure inference. Such a feat definitely would not have happened with o1.
> Just to spell it out as clearly as possible: a next-word prediction machine (because that's really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies.
> For OpenAI, the models had access to a code execution sandbox, so they could compile and test out their solutions. That was it though; no internet access.
We still have next to no real information on how the models achieved the gold medal. It’s a little early to be confirming anything, especially when the main source is a Twitter thread initiated by a company known for “exaggerating” the truth.
True, but aren't the math (and competitive programming) achievements a bit different? They're specific models heavily RL'd on competition math problems. Obviously still ridiculously impressive, but if you haven't done competition math or programming before it's much more memorization of techniques than you might expect and it's much easier to RL on.
simonw|4 months ago
Here's OpenAI's tweet about this: https://twitter.com/SebastienBubeck/status/19465776504050567...
> Just to spell it out as clearly as possible: a next-word prediction machine (because that's really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies.
My notes: https://simonwillison.net/2025/Jul/19/openai-gold-medal-math...
They DID use tools for the International Collegiate Programming Contest (ICPC) programming one though: https://twitter.com/ahelkky/status/1971652614950736194
> For OpenAI, the models had access to a code execution sandbox, so they could compile and test out their solutions. That was it though; no internet access.
emp17344|4 months ago
MoltenMan|4 months ago