top | item 42848945

(no title)

qqtt | 1 year ago

Well you have to keep in mind that Nvidia has a 3 trillion dollar valuation. That kind of heavy valuation comes with heavy expectations about future growth. Some of those assumptions about future Nvidia growth are their ability to maintain their heavy growth rates, for very far into the future.

Training is a huge component of Nvidia's projected growth. Inference is actually much more competitive, but training is almost exclusively Nvidia's domain. If Deepseek's claims are true, that would represent a 10x reduction in cost for training for similar models (6 million for r1 vs 60 million for something like o1).

It is absolutely not the case in ML that "there is nothing bad about more resources". There is something very bad - cost. And another bad thing - depreciation. And finally, another bad thing - the fact that new chips and approaches are coming out all the time, so if you are on older hardware you might be missing out. Training complex models for cheaper will allow companies to potentially re-allocate away from hardware into software (ie, hiring more engineering to build more models, instead of less engineers and more hardware to build less models).

Finally, there is a giant elephant in the room that it is very unclear if throwing more resources at LLM training will net better results. There are diminishing returns in terms of return on investment in training, especially with LLM-style use cases. It is actually very non-obvious right now how pouring more compute specifically at training will result in better LLMs.

discuss

order

msoad|1 year ago

My layman view is that more compute (more reasoning) will not solve harder problems. I'm using those models every day and when problem hits a certain complexity it will fail, no matter how much it "reasons"

johnfn|1 year ago

I think this is fairly easily debunked by o1, which is basically just 4o in a thinking for loop, and performs better on difficult tasks. Not a LOT better, mind you, but better enough to be measurable.

jes5199|1 year ago

I had a similar intuition for a long time, but I’ve watched the threshold of “certain complexity” move, and I’m no longer convinced that I know when it’s going to stop