top | item 46234974

(no title)

sfmike | 2 months ago

Everything is still based on 4 4o still right? is a new model training just too expensive? They can consult deepseek team maybe for cost constrained new models.

discuss

elgatolopez|2 months ago

Where did you get that from? Cutoff date says august 2025. Looks like a newly pretrained model

FergusArgyll|2 months ago

> This stands in sharp contrast to rivals: OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome.

- https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...

It's also plainly obvious from using it. The "Broadly deployed" qualifier is presumably referring to 4.5

SparkyMcUnicorn|2 months ago

If the pretraining rumors are true, they're probably using continued pretraining on the older weights. Right?

verdverm|2 months ago

Apparently they have not had a successful pre training run in 1.5 years

fouronnes3|2 months ago

I want to read a short scify story set in 2150 about how, mysteriously, no one has been able to train a better LLM for 125 years. The binary weights are studied with unbelievably advanced quantum computers but no one can really train a new AI from scratch. This starts cults, wars and legends and ultimately (by the third book) leads to the main protagonist learning to code by hand, something that no human left alive still knows how to do. Could this be the secret to making a new AI from scratch, more than a century later?

ijl|2 months ago

What kind of issues could prevent a company with such resources from that?

Wowfunhappy|2 months ago

I thought whenever the knowledge cutoff increased that meant they’d trained a new model, I guess that’s completely wrong?

rockinghigh|2 months ago

They add new data to the existing base model via continuous pre-training. You save on pre-training, the next token prediction task, but still have to re-run mid and post training stages like context length extension, supervised fine tuning, reinforcement learning, safety alignment ...

brokencode|2 months ago

Typically I think, but you could pre-train your previous model on new data too.

I don’t think it’s publicly known for sure how different the models really are. You can improve a lot just by improving the post-training set.

catigula|2 months ago

The irony is that Deepseek is still running with a distilled 4o model.

blovescoffee|2 months ago

Source?