(no title)
ericskiff | 1 year ago
They want a 10x improvement from scaling and a 10x improvement from data and algorithmic changes
The sources of public data are essentially tapped
Algorithmic changes will be an unknown to us until they release, but from published research this remains a steady source of improvement
Scaling seems to stall if data is limited
So with all of that taken together, the logical step is to figure out how to turn compute into better data to train on. Enter strawberry / o1, and now o3
They can throw money, time, and compute at thinking about and then generating better training data. If the belief is that N billion new tokens of high quality training data will unlock the leap in capabilities they’re looking for, then it makes sense to delay the training until that dataset is ready
With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field. OpenAI’s next moat may be the best synthetic training set ever.
At this point I would guess we get 4.5 with a subset of this - some scale improvement, the algorithmic pickups since 4 was trained, and a cleaned and improved core data set but without risking leakage of the superior dataset
When 5 launches, we get to see what a fully scaled version looks like with training data that outstrips average humans in almost every problem space
Then the next o-model gets to start with that as a base and reason? Its likely to be remarkable
sdwr|1 year ago
I was watching a YouTube interview with a "trading floor insider". They said they were really being paid for holding risk. The bank has a position in a market, and it's their ass on the line if it tanks.
ChatGPT (as far as I can tell) is no closer to being accountable or responsible for anything it produces. If they don't solve that (and the problem is probably inherent to the architecture), they are, in some sense, polishing a turd.
nightowl_games|1 year ago
I think that's a really interesting insight that has application to using 'AI' in jobs across the board.
zifpanachr23|1 year ago
There are a lot of moral conundrums that are just not going to work out with this. Seems like an attempt to just offload liability and it seems like pretty much everybody has caught onto that as being it's main selling point and probably main thing that will keep it from ever being accepted for anything important.
tucnak|1 year ago
What does it even mean? How do you imagine that? You want OpenAI to take on liability for the kicks of it?
Stevvo|1 year ago
I highly doubt that. o3 is many orders of magnitude more expensive than paying subject matter experts to create new data. It just doesn't make sense to pay six figures in compute to get o3 to make data a human could make for a few hundred dollars.
bookaway|1 year ago
That being said, if OpenAI is burning cash at lightspeed and doesn't have to publicly reveal the revenue they receive from certain government entities, it wouldn't come as a surprise if they let the government play with it early on in exchange for some much needed cash to set on fire.
EDIT: The fact that multiple sites seem to be publishing GPT-5 stories similar to this one leads one to conclude that the o3 benchmark story was meant to counter the negativity from this and other similar articles that are just coming out.
mrshadowgoose|1 year ago
GolfPopper|1 year ago
I suspect this is really, "churning out text that impresses management".
tshadley|1 year ago
dartos|1 year ago
DougN7|1 year ago
rtsil|1 year ago
> The process is painfully slow. GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words a day would take months to produce a billion tokens.
And if the human-generated data was so qualitatively good that it is smaller by three order of magnitudes, than I can assume it would be at least as expensive as o3.
az226|1 year ago
Cost of coordination is also large. Immediate answers are an advantage/selling point.
nialv7|1 year ago
I don't think oai has any moat at all. If you look around, QwQ from Alibaba is already pushing o1-preview performances. I think oai is only ahead by 3~6 months at most.
vasco|1 year ago
Like let's say you have a few datacenters of compute at your disposal and the ability to instantiate millions of AGI agents - what do you have them do?
I wonder if the USA already has a secret program for this under national defense. But it is interesting that once you do control an actual AGI you'd want to speed-run a bunch of things. In opposition to that, how do you detect an adversary already has / is using it and what to do in that case.
acyou|1 year ago
dartos|1 year ago
ynniv|1 year ago
nialv7|1 year ago
jsheard|1 year ago
Even taking OpenAI and the benchmark authors at their word they said that it is consuming at least tens of dollars per task to hit peak performance, how much would it cost to have it produce a meaningfully large training set?
qup|1 year ago
noman-land|1 year ago
psb217|1 year ago
The basic loop is: (i) generate synthetic data, (ii) rate synthetic data, (iii) update model to put more probability on better data and less probability on worse data, then go back to (i).
viraptor|1 year ago
> This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers
But there are a few others. In general good data is good data. We're definitely learning more about how to produce good synthetic version.
Majromax|1 year ago
If the model generates data of variable quality, and if there's a good way to distinguish good data from bad data, then training on self-generated data might "bootstrap" a model to better performance.
This is common in reinforcement learning. Famously, AlphaGo Zero (https://en.wikipedia.org/wiki/AlphaGo_Zero) learned exclusively on self-play, without reference to human-played games.
Of course, games have a built-in critic: the better strategy usually wins. It's much harder to judge the answer to a math problem, or decide which essay is more persuasive, or evaluate restaurant recommendations.
dyauspitr|1 year ago
nradov|1 year ago
mistercheph|1 year ago
the highest quality language data that exists is in the public domain