top | item 46130379

(no title)

nbardy | 2 months ago

This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.

The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.

Sure they chose to not serve the large base models anymore for cost reasons.

But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.

In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.

Big model => distill => RL

Makes the most theoretical sense for training now days for efficient spending.

So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.

discuss

barrell|2 months ago

My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models

tim333|2 months ago

Cutoff dates seem to be Oct 2024 for GPT-4.5, and Jan 2025 for the Gemini models.

It kind of explains a coding issue I had with tradingview who update their pinescript thing quite frequently. ChatGPT seemed to have issues with v4 vs v5.

copedetector|2 months ago

[deleted]