WingNews

alecco|2 months ago

SemiAnalysis said it last week and AFAIK it wasn't denied.

https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...

The SemiAnalysis article that you linked to stated:

"OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome."

Given the overall quality of the article, that is an uncharacteristically convoluted sentence. At the risk of stating the obvious, "that was broadly deployed" (or not) is contingent on many factors, most of which are not of the GPU vs. TPU technical variety.

binkHN|2 months ago

This is a really great breakdown. With TPUs seemingly more efficient and costing less overall, how does this play for Nvidia? What's to stop them from entering the TPU race with their $5 trillion valuation?

CamperBob2|2 months ago

That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.

rahimnathwani|2 months ago

Dylan Patel joined Dwarkesh recently to interview Satya Nadella: https://www.dwarkesh.com/p/satya-nadella-2

unknown|2 months ago

[deleted]

mvkel|2 months ago

It's not a rumor, it's confirmed by OpenAI. All "models" since 4o are actually just optimizations in prompting and a new routing engine. The actual -model- you are using with 5.1 is 4. Nothing has been pre-trained from scratch since 4o.

Their own press releases confirm this. They call 5 their best new "ai system", not a new model

https://openai.com/index/introducing-gpt-5/

krackers|2 months ago

I can believe this, Deepseek V3.2 shows that you can get close to "gpt-5" performance with a gpt-4 level base model just with sufficient post-training.

Davidzheng|2 months ago

I don't think that counts as confirmation. 4.5 we know was a new base-model. I find it very very unlikely the base model of 4 (or 4o) is in gpt5. Also 4o is a different base model from 4 right? it's multimodal etc. Pretty sure people have leaked sizes etc and I don't think it matches up.

staticman2|2 months ago

New AI system doesn't preclude new models. I thought when GPT 5 launched and users hated it the speculation was GPT 5 was a cost cutting model and the routing engine was routing to smaller, specialized dumber models that cost less on inference?

It certainly was much dumber than 4o on Perplexity when I tried it.

m3kw9|2 months ago

Well then 5.x is pretty impressive

Forgeties79|2 months ago

Maybe this is just armchair bs on my part, but it seems to me that the proliferation of AI-spam and just general carpet bombing of low effort SEO fodder would make a lot of info online from the last few years totally worthless.

Hardly a hot take. People have theorized about the ouroboros effect for years now. But I do wonder if that’s part of the problem

p1necone|2 months ago

Every so often I try out a GPT model for coding again, and manage to get tricked by the very sparse conversation style into thinking it's great for a couple of days (when it says nothing and then finishes producing code with a 'I did x, y and z' with no stupid 'you're absolutely' right sucking up and it works, it feels very good).

But I always realize it's just smoke and mirrors - the actual quality of the code and the failure modes and stuff are just so much worse than claude and gemini.

kshacker|2 months ago

I am a novice programmer -- I have programmed for 35+ years now but I build and lose the skills moving between coder to manager to sales -- multiple times. Fresh IC since last week again :) I have coded starting with Fortran, RPG and COBOL and I have also coded Java and Scala. I know modern architecture but haven't done enough grunt work to make it work or to debug (and fix) a complex problem. Needless to say sometimes my eyes glaze over the code.

And I write some code for my personal enjoyment, and I gave it to Claude 6-8 months back for improvement, it gave me a massive change log and it was quite risky so abandoned it.

I tried this again with Gemini last week, I was more prepared and asked it to improve class by class, and for whatever reasons I got better answers -- changed code, with explanations, and when I asked it to split the refactor in smaller steps, it did so. Was a joy working on this over the thanksgiving holidays. It could break the changes in small pieces, talk through them as I evolved concepts learned previously, took my feedback and prioritization, and also gave me nuanced explanation of the business objectives I was trying to achieve.

This is not to downplay claude, that is just the sequence of events narration. So while it may or may not work well for experienced programmers, it is such a helpful tool for people who know the domain or the concepts (or both) and struggle with details, since the tool can iron out a lot of details for you.

My goal now is to have another project for winter holidays and then think through 4-6 hour AI assisted refactors over the weekends. Do note that this is a project of personal interest so not spending weekends for the big man.

tartoran|2 months ago

I'm starting with Claude at work but did have an okay experience with OpenAi so far. For clearly delimited tasks it does produce working code more often than not. I've seen some improvement on their side compared to say, last year. For something more complex and not clearly defined in advance, yes, it does produce plausible garbage and it goes off the rails a lot. I was migrating a project and asked ChatGPT to analyze the original code base and produce a migration plan. The result seemed good and encouraging because I didn't know much about that project at that time. But I ended up taking a different route and when I finished the migration (with bits of help from ChatGPT) I looked at the original migration plan out of curiosity since I had become more familiar with the project by now. And the migration plan was an absolutely useless and senseless hallucination.

herpdyderp|2 months ago

On the contrary, I cannot use the top Gemini and Claude models because their outputs are so out place and hard to integrate with my code bases. The GPT 5 models integrate with my code base's existing patterns seamlessly.

findjashua|2 months ago

NME at all - 5.1 codex has been the best by far.

stevedonovan|2 months ago

I've been getting great results from Codex. Can be a bit slow, but gets there. Writes good Rust, powers through integration test generation.

So (again) we are just sharing anecdata

sharyphil|2 months ago

You're absolutely right!

Somehow it doesn't get on my nerves (unlike Gemini with "Of course").

jpalomaki|2 months ago

Can you give some concrete example of programming problem task GPT fails to solve?

Interested, because I’ve been getting pretty good results with different tasks using the Codex.

logicchains|2 months ago

I find for difficult questions math and design questions GPT5 tends to produce better answers than Claude and Gemini.

bsder|2 months ago

At this point you are now forced to use the "AI"s as code search tools--and it annoys me to no end.

The problem is that the "AI"s can cough up code examples based upon proprietary codebases that you, as an individual, have no access to. That creates a significant quality differential between coders who only use publicly available search (Google, Github, etc.) vs those who use "AI" systems.

CheeseFromLidl|2 months ago

Same experience here. The more commonly known the stuff it regurgitates is, the fewer errors. But if you venture into RF electronics or embedded land, beware of it turning into a master of bs.

Which makes sense for something that isn’t AI but LLM.

xnx|2 months ago

OpenAI is in the "don't look behind the curtain" stage with both their technology and finances.

nickff|2 months ago

I recall reading that Google had similar 'delay' issues when crawling the web in 2000 and early 2001, but they managed to survive. That said, OpenAI seems much less differentiated (now) than Google was back then, so this may be a much riskier situation.

echelon|2 months ago

Google didn't raise at a $500 billion valuation.

The 25x revenue multiple wouldn't be so bad if they weren't burning so much cash on R&D and if they actually had a moat.

Google caught up quick, the Chinese are spinning up open source models left and right, and the world really just isn't ready to adopt AI everywhere yet. We're in the premature/awkward phase.

They're just too early, and the AGI is just too far away.

Doesn't look like their "advertising" idea to increase revenue is working, either.

redbluered|2 months ago

The differentiation should be open source, nonprofit, and ethical.

As a shady for-profit, there is none. That's the problem with this particular fraud.

savrajsingh|2 months ago

Yes, the story was something like Google hadn’t rebuilt their index for something like 8 months if I recall correctly

mikepurvis|2 months ago

I noticed this recently when I asked it whether I should play Indiana Jones on my PS5 or PC with a 9070 XT. It assumed I had made a typo until I clarified, then it went off to the internet and came back telling me what a sick rig I have.

impulser_|2 months ago

OpenAI is the only SOTA model provider that doesn't have a cutoff date in the current year. That why it preforms bad at writing code for any new libraries or libraries that have had significant updates like Svelte.

rvnx|2 months ago

State Of The Art is maybe a bit exaggerated. It's more like an early model that never really adapted, and only got watered down (smaller network, outdated information, and you cannot see thought/reasoning).

Also their models get dumber and dumber over time.

wrcwill|2 months ago

im not sure why we need to go off rumours, the knowledge cutoff for each openai model is clearly listed in the table:

https://platform.openai.com/docs/models/compare?model=gpt-5....

amluto|2 months ago

I asked ChatGPT 5.1 to help me solve a silly installation issue with the codex command line tool (I’m not an npm user and the recommended installation method is some kludge using npm), and ChatGPT told me, with a straight face, that codex was discontinued and that I must have meant the “openai” command.

Coneylake|2 months ago

"with a straight face"

nextworddev|2 months ago

Don’t forget SemiAnalysis’s founder Dylan Patel is supposedly roommates with Anthropics RL tech lead Sholto..

nickysielicki|2 months ago

The fundamental problem with bubbles like this, is that you get people like this who are able to take advantage of the The Gell-Mann amnesia effect, except the details that they’re wrong about are so niche that there’s a vanishingly small group of people who are qualified to call them out on it, and there’s simultaneously so much more attention on what they say because investors and speculators are so desperate and anxious for new information.

I followed him on Twitter. He said some very interesting things, I thought. Then he started talking about the niche of ML/AI I work near, and he was completely wrong about it. I became enlightened.

searls|2 months ago

Funny, had it tell me the same thing twice yesterday and that was _with_ thinking + search enabled on the request (it apparently refused to carry out the search, which it does once in every blue moon).

I didn't make this connection that the training data is that old, but that would indeed augur poorly.

hn_throwaway_99|2 months ago

Just a minor correction, but I think it's important because some comments here seem to be giving bad information, but on OpenAI's model site it says that the knowledge cutoff for gpt-5 is Sept 30, 2024, https://platform.openai.com/docs/models/compare, which is later than the June 01, 2024 date of GPT-4.1.

Now I don't know if this means that OpenAI was able to add that 3 months of data to earlier models by tuning or if it was a "from scratch" pre-training run, but it has to be a substantial difference in the models.

mr_00ff00|2 months ago

What is a pre-training run?

nodja|2 months ago

Pre-training is just training, it got the name because most models have a post-training stage so to differentiate people call it pre-training.

Pre-training: You train on a vast amount of data, as varied and high quality as possible, this will determine the distribution the model can operate with, so LLMs are usually trained on a curated dataset of the whole internet, the output of the pre-training is usually called the base model.

Post-training: You narrow down the task by training on the specific model needs you want. You can do this through several ways:

- Supervised Finetuning (SFT): Training on a strict high quality dataset of the task you want. For example if you wanted a summarization model, you'd finetune the model on high quality text->summary pairs and the model would be able to summarize much better than the base model.

- Reinforcement Learning (RL): You train a separate model that ranks outputs, then use it to rate the output of the model, then use that data to train the model.

- Direct Preference Optimizaton (DPO): You have pairs of good/bad generations and use them to align the model towards/away the kinds of responses you want.

Post-training is what makes the models able to be easily used, the most common is instruction tuning that teaches to model to talk in turns, but post-training can be used for anything. E.g. if you want a translation model that always translates a certain way, or a model that knows how to use tools, etc. you'd achieve all that through post-training. Post-training is where most of the secret sauce in current models is nowadays.

tim333|2 months ago

If you've an hour to spare this Karpathy video is good at explaining how it all works https://youtu.be/7xTGNNLPyMI

abixb|2 months ago

The first step in building a large language model. That's when the model is initiated and trained on a huge dataset to learn patterns and whatnot. The "P" in "GPT" stands for "pre-trained."

bckr|2 months ago

That’s where they take their big pile of data and train the model to do next-token-prediction.

fovc|2 months ago

Łukasz Kaiser basically confirmed it in a podcast:

https://youtu.be/3K-R4yVjJfU?si=JdVyYOlxUbEcvEEo&t=2624

> Q: Are the releases aligned with pre-training efforts?

> A: There used to be a time not that long ago, maybe half a year, distant past, where the models would align with RL runs or pretraining runs ... now the naming is by capability. GPT5 is a capable model; 5.1 is a more capable model

jimbohn|2 months ago

I wonder if the failures to pretrain are the result of our understanding of neural networks being more akin to alchemy rather than chemistry

kristianp|2 months ago

I doubt it's that important that their dataset of current events is up to date. At this stage, I believe private and synthetic data comprises a large fraction of pretraining. Web search substitutes for current event pretraining.

f311a|2 months ago

I tried OpenAI models for coding in Go, but they constantly say your syntax is not correct. Let me rewrite your whole file without `any`.`any` was introduced in 2022. It takes some time to adopt it in codebases, but they should not be doing stuff like that at the end of 2025.

manmal|2 months ago

That would explain why it’s so bad with new Swift features and more recent ast-grep rules.

mips_avatar|2 months ago

Usually current events get taught through mid-training, so even with old pre-training current events still could be added

throwaway314155|2 months ago

It has no idea what it's own knowledge cutoff is.

octoberfranklin|2 months ago

Knowledge cutoff date is usually part of the system prompt.

Helps you get useful answers like "I don't know that's too recent" when you ask questions like "who won the basketball game last night".

simianparrot|2 months ago

Any data after that is contaminated with vast amount of AI slop. Is anyone training on anything newer..?

bakugo|2 months ago

Both Claude 4.x and Gemini 3 go up to early 2025. Still not that recent, but not as bad as GPT.

unknown|2 months ago

[deleted]

mingodad|2 months ago

Interesting enough today I found that https://huggingface.co/openai/gpt-oss-20b when downloaded an running locally and asked:

====

when have you been built?

I’m a GPT‑4‑based model that OpenAI released on March 14 2023. The underlying training data goes up to early 2023, and the model was trained in the months leading up to that release. If you’re asking about the ChatGPT product that ships the model to users, it went live in November 2022 and has since received updates (GPT‑3.5, GPT‑4, etc.) that keep it current.

====

But when supposedly running it from https://huggingface.co/chat/models/openai/gpt-oss-20b:

====

when have you been built?

I’m a language model created by OpenAI. The current generation (GPT‑4) that powers this chat was first released in March 2023 and has been updated and fine‑tuned up through the end of 2024. My training data runs up to the beginning of June 2025, so I’m built on knowledge available up to that point.

====

And that makes me thinking that although https://huggingface.co/chat claims to be using the models available to public at https://huggingface.co , it doesn't seems to be true and I raised this question here https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/discussions... , https://github.com/huggingface/inference-playground/issues/1... and https://github.com/ggml-org/llama.cpp/discussions/15396#disc... .