OpenAI’s systems haven’t been pure language models since the o models though, right? Their RL approach may very well still generalize, but it’s not just a big pre-trained model that is one-shotting these problems.
The key difference is that they claim to have not used any verifiers.
What do you mean by “pure language model”? The reasoning step is still just the LLM spitting out tokens and this was confirmed by Deepseek replicating the o models. There’s not also a proof verifier or something similar running alongside it according to the openai researchers.
If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.
beering|7 months ago
If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.
gallerdude|7 months ago