Why would it cast any doubt? If you can use o1 output to build a better R1. Then use R1 output to build a better X1... then a better X2.. XN, that just shows a method to create better systems for a fraction of the cost from where we stand. If it was that obvious OpenAI should have themselves done. But the disruptors did it. It hindsight it might sound obvious, but that is true for all innovations. It is all good stuff.
Imnimo|1 year ago
(with the caveat that all we have right now are accusations that DeepSeek made use of OpenAI data - it might just as well turn out that DeepSeek really did work independently, and you really could have gotten o1-like performance with much less compute)
deepGem|1 year ago
In this study, we demonstrate that reasoning capabilities can be significantly improved through large-scale reinforcement learning (RL), even without using supervised fine-tuning (SFT) as a cold start. Furthermore, performance can be further enhanced with the inclusion of a small amount of cold-start data
Is this cold start data what OpenAI is claiming their output ? If so what's the big deal ?
manquer|1 year ago
It is no better for OpenAI in this scenario either, any competitor can easily copy their expensive training without spending the same, i.e. there is a second mover advantage and no economic incentive to be the first one.
To put it another way, the $500 Billion Stargate investment will be worth just $5Billion once the models become available for consumption, because it only will take that much to replicate the same outcomes with new techniques even if the cold start needed o1 output for RL.
MrLeap|1 year ago
vkou|1 year ago
Let's just assume that the cost of training can be externalized to other people for free.
hmottestad|1 year ago
The big question really is, are we doing it wrong, could we have created o1 for a fraction of the price. Will o4 cost less to train than o1 did?
The second question is naturally. If we create a smarter LLM, can we use it to create another LLM that is even smarter?
It would have been fantastic if DeepSeek could have come out with an o3 competitor before o3 even became publicly available. That way we would have known for sure that we’re doing it wrong. Cause then either we could have used o1 to train a better AI or we could have just trained in a smarter and cheaper way.
cherry_tree|1 year ago
Whether or not you could have, you can now.
SpaceManNabs|1 year ago
zombiwoof|1 year ago
philipwhiuk|1 year ago
rockemsockem|1 year ago
All of this should have been clear anyway from the start, but that's the Internet for you.
joe_the_user|1 year ago
Hmm, I think the narrative of the rise of LLMs is that once the output of humans has been distilled by the model, the human isn't necessary.
As far as I know, DeepSeek adds only a little to the transformers model while o1/o3 added a special "reasoning component" - if DeepSeek is as good as o1/o3, even taking data from it, then it seems the reasoning component isn't needed.
aprilthird2021|1 year ago
I did not think this, nor did I think this was what others assumed. The narrative, I thought, was that there is little point in paying OpenAI for LLM usage when a much cheaper, similar / better version can be made and used for a fraction of the cost (whether it's on the back of existing LLM research doesn't factor in)
hmmm-i-wonder|1 year ago
But HOW they are necessary is the change. They went from building blocks to stepping stones. From a business standpoint that's very damaging to OAI and other players.
KingOfCoders|1 year ago
patcon|1 year ago
And is this related to the lottery ticket hypothesis?
https://arxiv.org/pdf/1803.03635.pdf
herodoturtle|1 year ago
I have a question (disclaimer: reinforcement learning noob here):
Is there a risk of broken telephone with this?
Kinda like repeatedly compressing an already compressed image eventually leads to a fuzzy blur.
If that is the case then I’m curious how this is monitored and / or mitigated.
ospray|1 year ago
RHSman2|1 year ago
That is where artificial intelligence is going. Copy things from other things. Will there be a AI Eureka moment where it deviates and knows where and why the reason it is wrong?
indymike|1 year ago
anothernewdude|1 year ago
dontreact|1 year ago
It seems like if they in fact distilled then what we have found is that you can create a worse copy of the model for ~5m dollars in compute by training on its outputs.
iforgot22|1 year ago
qwertox|1 year ago
unreal37|1 year ago
Everyone is standing on the shoulders of giants.
bigfudge|1 year ago
dartos|1 year ago
Better benchmark scores can be cooked
Sophira|1 year ago
lenerdenator|1 year ago
But if you leave someone in the tech industry of SV/SF long enough, they'll start to get high on their own supply and think they're entitled to insane amounts of value, so...
goatlover|1 year ago
gmd63|1 year ago
wgjordan|1 year ago