Everything is still based on 4 4o still right? is a new model training just too expensive? They can consult deepseek team maybe for cost constrained new models.
> This stands in sharp contrast to rivals: OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome.
I want to read a short scify story set in 2150 about how, mysteriously, no one has been able to train a better LLM for 125 years. The binary weights are studied with unbelievably advanced quantum computers but no one can really train a new AI from scratch. This starts cults, wars and legends and ultimately (by the third book) leads to the main protagonist learning to code by hand, something that no human left alive still knows how to do. Could this be the secret to making a new AI from scratch, more than a century later?
They add new data to the existing base model via continuous pre-training. You save on pre-training, the next token prediction task, but still have to re-run mid and post training stages like context length extension, supervised fine tuning, reinforcement learning, safety alignment ...
elgatolopez|2 months ago
FergusArgyll|2 months ago
- https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...
It's also plainly obvious from using it. The "Broadly deployed" qualifier is presumably referring to 4.5
SparkyMcUnicorn|2 months ago
verdverm|2 months ago
fouronnes3|2 months ago
ijl|2 months ago
Wowfunhappy|2 months ago
rockinghigh|2 months ago
brokencode|2 months ago
I don’t think it’s publicly known for sure how different the models really are. You can improve a lot just by improving the post-training set.
catigula|2 months ago
blovescoffee|2 months ago