(no title)
nbardy | 2 months ago
The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.
Sure they chose to not serve the large base models anymore for cost reasons.
But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.
In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.
Big model => distill => RL
Makes the most theoretical sense for training now days for efficient spending.
So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.
barrell|2 months ago
tim333|2 months ago
It kind of explains a coding issue I had with tradingview who update their pinescript thing quite frequently. ChatGPT seemed to have issues with v4 vs v5.
copedetector|2 months ago
[deleted]