top | item 42486273

(no title)

What we can reasonably assume from statements made by insiders:

They want a 10x improvement from scaling and a 10x improvement from data and algorithmic changes

The sources of public data are essentially tapped

Algorithmic changes will be an unknown to us until they release, but from published research this remains a steady source of improvement

Scaling seems to stall if data is limited

So with all of that taken together, the logical step is to figure out how to turn compute into better data to train on. Enter strawberry / o1, and now o3

They can throw money, time, and compute at thinking about and then generating better training data. If the belief is that N billion new tokens of high quality training data will unlock the leap in capabilities they’re looking for, then it makes sense to delay the training until that dataset is ready

With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field. OpenAI’s next moat may be the best synthetic training set ever.

At this point I would guess we get 4.5 with a subset of this - some scale improvement, the algorithmic pickups since 4 was trained, and a cleaned and improved core data set but without risking leakage of the superior dataset

When 5 launches, we get to see what a fully scaled version looks like with training data that outstrips average humans in almost every problem space

Then the next o-model gets to start with that as a base and reason? Its likely to be remarkable

discuss

sdwr|1 year ago

Great improvements and all, but they are still no closer (as of 4o regular) to having a system that can be responsible for work. In math problems, it forgets which variable represents what, in coding questions it invents library fns.

I was watching a YouTube interview with a "trading floor insider". They said they were really being paid for holding risk. The bank has a position in a market, and it's their ass on the line if it tanks.

ChatGPT (as far as I can tell) is no closer to being accountable or responsible for anything it produces. If they don't solve that (and the problem is probably inherent to the architecture), they are, in some sense, polishing a turd.

nightowl_games|1 year ago

> They said they were really being paid for holding risk.

I think that's a really interesting insight that has application to using 'AI' in jobs across the board.

zifpanachr23|1 year ago

This is underdiscussed. I don't think people understand just how worthless AI is in a ton of fields until it is able to be held liable and be sent to prison.

There are a lot of moral conundrums that are just not going to work out with this. Seems like an attempt to just offload liability and it seems like pretty much everybody has caught onto that as being it's main selling point and probably main thing that will keep it from ever being accepted for anything important.

tucnak|1 year ago

> ChatGPT (as far as I can tell) is no closer to being accountable or responsible for anything it produces.

What does it even mean? How do you imagine that? You want OpenAI to take on liability for the kicks of it?

Stevvo|1 year ago

"With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field."

I highly doubt that. o3 is many orders of magnitude more expensive than paying subject matter experts to create new data. It just doesn't make sense to pay six figures in compute to get o3 to make data a human could make for a few hundred dollars.

bookaway|1 year ago

Yes, I think they had to push this reveal forward because their investors were getting antsy with the lack of visible progress to justify continuing rising valuations. There is no other reason a confident company making continuous rapid progress would feel the need to reveal a product that 99% of companies worldwide couldn't use at the time of the reveal.

That being said, if OpenAI is burning cash at lightspeed and doesn't have to publicly reveal the revenue they receive from certain government entities, it wouldn't come as a surprise if they let the government play with it early on in exchange for some much needed cash to set on fire.

EDIT: The fact that multiple sites seem to be publishing GPT-5 stories similar to this one leads one to conclude that the o3 benchmark story was meant to counter the negativity from this and other similar articles that are just coming out.

mrshadowgoose|1 year ago

Can SMEs deliver that data in a meaningful amount of time? Training data now is worth significantly more than data a year from now.

GolfPopper|1 year ago

>churning out new thinking at expert level across every field

I suspect this is really, "churning out text that impresses management".

tshadley|1 year ago

Seems to me o3 prices would be what the consumer pays, not what OpenAI pays. That would mean o3 could be more efficient in-house than paying subject-matter experts.

dartos|1 year ago

That’s an interesting idea. What if OpenAI funded medical research initiatives in exchange for exclusive training rights on the research.

DougN7|1 year ago

Someone needs to dress up Mechanical Turk and repackage it as an AI company…..

rtsil|1 year ago

Unless the quality of the human data are extraordinary, it seems according to the TFA that it's not that easy:

> The process is painfully slow. GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words a day would take months to produce a billion tokens.

And if the human-generated data was so qualitatively good that it is smaller by three order of magnitudes, than I can assume it would be at least as expensive as o3.

az226|1 year ago

Only a matter of time. The costs are aggressively going down. And with specialized inference hardware it will go further down.

Cost of coordination is also large. Immediate answers are an advantage/selling point.

nialv7|1 year ago

> OpenAI’s next moat

I don't think oai has any moat at all. If you look around, QwQ from Alibaba is already pushing o1-preview performances. I think oai is only ahead by 3~6 months at most.

vasco|1 year ago

If their AGI dreams would come true it might be more than enough to have 3 months head start. They probably won't, but it's interesting to ponder what the next few hours, days, weeks would be for someone that would wield AGI.

Like let's say you have a few datacenters of compute at your disposal and the ability to instantiate millions of AGI agents - what do you have them do?

I wonder if the USA already has a secret program for this under national defense. But it is interesting that once you do control an actual AGI you'd want to speed-run a bunch of things. In opposition to that, how do you detect an adversary already has / is using it and what to do in that case.

acyou|1 year ago

That is why being #2 in technical product development can be great. Someone else pays to work out the kinks, copy what works and improve on it at a fraction of the cost. You see it time and time again.

dartos|1 year ago

I’m curious how, if at all, the plan to get around compounding bias in synthetic data generated by models trained in synthetic data.

ynniv|1 year ago

Everyone's obsessed with new training tokens... It doesn't need to be more knowledgeable, it just needs to practice more. Ask any student: practice is synthetic data.

nialv7|1 year ago

synthetic data is fine if you can ground the model somehow. that's why the o1/o3's improvements are mostly in reasoning, maths, etc., because you can easily tell if the data is wrong or not.

jsheard|1 year ago

> With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field. OpenAI’s next moat may be the best synthetic training set ever.

Even taking OpenAI and the benchmark authors at their word they said that it is consuming at least tens of dollars per task to hit peak performance, how much would it cost to have it produce a meaningfully large training set?

qup|1 year ago

That's the public API price isn't it?

noman-land|1 year ago

I completely don't understand the use for synthetic data. What good it's it to train a model basically on itself?

psb217|1 year ago

The value of synthetic data relies on having non-zero signal about which generated data is "better" or "worse". In a sense, this what reinforcement learning is about. Ie, generate some data, have that data scored by some evaluator, and then feed the data back into the model with higher weight on the better stuff and lower weight on the worse stuff.

The basic loop is: (i) generate synthetic data, (ii) rate synthetic data, (iii) update model to put more probability on better data and less probability on worse data, then go back to (i).

viraptor|1 year ago

This is a good read for some examples https://arxiv.org/abs/2203.14465

> This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers

But there are a few others. In general good data is good data. We're definitely learning more about how to produce good synthetic version.

Majromax|1 year ago

> What good it's it to train a model basically on itself?

If the model generates data of variable quality, and if there's a good way to distinguish good data from bad data, then training on self-generated data might "bootstrap" a model to better performance.

This is common in reinforcement learning. Famously, AlphaGo Zero (https://en.wikipedia.org/wiki/AlphaGo_Zero) learned exclusively on self-play, without reference to human-played games.

Of course, games have a built-in critic: the better strategy usually wins. It's much harder to judge the answer to a math problem, or decide which essay is more persuasive, or evaluate restaurant recommendations.

dyauspitr|1 year ago

If we get to a point where we have a model that when fed a real world stream of data (YouTube, surveillance cameras, forum data, cell phone conversations etc.) and can prune out a good training set for itself then you’re at the point where the LLM is in a feedback loop where it can improve itself. That’s AGI for all intents and purposes.

nradov|1 year ago

There is an enormous "iceberg" of untapped non-public data locked behind paywalls or licensing agreements. The next frontier will be spending money and human effort to get access to that data, then transform it into something useful for training.

mistercheph|1 year ago

ah yes the beautiful iceberg of internal documentation, legal paperwork, and meeting notes.

the highest quality language data that exists is in the public domain