Nvidia’s $589B DeepSeek rout

[+] plaidfuji|1 year ago|reply

Here’s a take I haven’t seen yet:

If training and inference just got 40x more efficient, but OpenAI and co. still have the same compute resources, once they’ve baked in all the DeepSeek improvements, we’re about to find out very quickly whether 40x the compute delivers 40x the performance / output quality, or if output quality has ceased to be compute-bound.

[+] AnthonyMouse|1 year ago|reply

> If training and inference just got 40x more efficient

Did training and inference just get 40x more efficient, or just training? They trained a model with impressive outputs on a limited number of GPUs, but DeepSeek is still a big model that requires a lot of resources to run. Moreover, which costs more, training a model once or using it for inference across a hundred million people multiple times a day for a year? It was always the second one, and doing the training cheaper makes it even more so.

But this implies that we could use those same resources to train even bigger models, right? Except that you then have the same problem. You have a bigger model, maybe it's better, but if you've made inference cost linearly more because of the size and the size is now 40x bigger, you now need that much more compute for inference.

[+] DebtDeflation|1 year ago|reply

In the long run (which in the AI world is probably ~1 year) this is very good for Nvidia, very good for the hyperscalers, and very good for anyone building AI applications.

The only thing it's not good for is the idea that OpenAI and/or Anthropic will eventually become profitable companies with market caps that exceed Apple's by orders of magnitude. Oh no, anyway.

[+] paul_e_warner|1 year ago|reply

Yes, but I think most of the rout is caused by the fact that there really isn't anything protecting AI from being disrupted by a new player - They're fairly simple technology compared to some of the other things tech companies build. That means openai really doesn't have much ability to protect it's market leader status.

I don't really understand why the stock market has decided this affects nvidia's stock price though.

[+] gloflo|1 year ago|reply

Does line go up forever?

[+] zeven7|1 year ago|reply

I've missed the stories on this until now. Is it known (and is there an ELI5) how they were able to do it so much more efficiently?

[+] frontalier|1 year ago|reply

there's this and that little desktop computer they announced earlier this month - digits

they claim it's able to run models with 200B parameters on a single node and 400B when paired with another node

[+] unknown|1 year ago|reply

[deleted]

[+] qwe----3|1 year ago|reply

Like everything, I expect improvement to be logarithmic

[+] nejsjsjsbsb|1 year ago|reply

That's a take I've seen in many HN comments

[+] BenFranklin100|1 year ago|reply

That seems to be the key question.

[+] xbmcuser|1 year ago|reply

Yeah this was my first thought as well. If it got so efficient how good all the models will be 2-3 months from now

[+] teleforce|1 year ago|reply

>If training and inference just got 40x more efficient

The jury is still out on how much improvement DeepSeek made in terms of training and inference compute efficiency, but personally I think 10x is probably the actual improvement that's being made

But in business/engineering/manufacturing/etc if you have 10x more efficiency, you're basically going to obliterate the competitions.

>output quality has ceased to be compute-bound

You raised an interesting conjecture and it seems that it's very likely the case.

I know that it's not even a full two years that ChatGPT-4 has been released but it seems that it take OpenAI a very long time to release ChatGPT-5. Is it because they're taking their own sweet time to release the software not unlike GIMP, or they genuinely cannot justify the improvement to jump from 4 to 5? This stagnation however, has allowed others to catch up. Now based on DeekSeek claims, anyone can has their own ChatGPT-4 under their desk with Nvidia project Digits mini PCs [1]. For running DeepSeek, 4 units mini PCs will be more than enough of 4 PFLOPS and cost only USD12K. Let's say on average one subscriber user pays OpenAI monthly payment of USD$10, for 1000 persons organization it will be USD$10K, and the investment will pays for itself within a month, and no data ever leave the organization since it's a private cloud!

For training similar system to ChatGPT-4 based on DeepSeeks claims, a few millions USD$ is more than enough. Apparently, OpenAI, Softbank and Oracle just announced USD$500 Billions joint ventures to bring the AI forward with the new announced Stargate AI project but that's 10,000x money [2],[3]. But the elephant in the room question is that, can they even get 10x quality improvement of the existing ChatGPT-4? I really seriously doubt it.

[1] NVIDIA Puts Grace Blackwell on Every Desk and at Every AI Developer’s Fingertips:

https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...

[2] Trump unveils $500bn Stargate AI project between OpenAI, Oracle and SoftBank:

https://www.theguardian.com/us-news/2025/jan/21/trump-ai-joi...

[3] Announcing The Stargate Project:

https://openai.com/index/announcing-the-stargate-project/

[+] ozten|1 year ago|reply

NVIDIA sells shovels to the gold rush. One miner (Liang Wenfeng), who has previously purchased at least 10,000 A100 shovels... has a "side project" where they figured out how to dig really well with a shovel and shared their secrets.

The gold rush, wether real or a bubble is still there! NVIDA will still sell every shovel they can manufacture, as soon as it is available in inventory.

Fortune 100 companies will still want the biggest toolshed to invent the next paradigm or to be the first to get to AGI.

[+] elihu|1 year ago|reply

Jevon's paradox would imply that there's good reason to think that demand for shovels will increase. AI doesn't seem to be one of those things where society as a whole will say, "we have enough of that; we don't need any more".

(Many individual people are already saying that, but they aren't the people buying the GPUs for this in the first place. Steam engines weren't universally popular either when they were introduced to society.)

[+] addicted|1 year ago|reply

What you are missing is that it turns out the gold isn’t actually gold. It’s bronze.

So earliest, the shovelers were willing to spend thousands of dollars for a single shovel because they were expecting to get much more valuable gold out the other end.

But now that it’s only bronze, they can’t spend that much money on their tools anymore to make their venture profitable. A lot of shovelers are gonna drop out of the race. And the ones that remain will not be willing to spend as much.

The fact that there isn’t that much money to be made in AI anymore means that whatever percentage of money would have gone to NVIDIA from the total money to be made in AI will now shrink dramatically.

[+] onlyrealcuzzo|1 year ago|reply

This can all be true, and Nvidia's market cap can still go down A TON.

Nvidia's market cap is based on extreme margins and absurd growth for 10 years.

If either of those nobs get turned down a little, there can be a MASSIVE hit to the valuation - which is what happened.

[+] bodegajed|1 year ago|reply

The gold rush is over because pre-trained models don't improve as much anymore. The application layer has massive gains in cost-to-value performance. We also gain more trust from the consumer as models don't hallucinate as much. This is what DeepSeek R1 has shown us. As Ilya Sutskever said, pre-training is now over.

We now have very expensive Nvidia shovels that use a lot of power but do very little improvement to the models.

[+] UncleOxidant|1 year ago|reply

AGI would be El Dorado in this analogy?

[+] avs733|1 year ago|reply

The thing with a gold rush is you often end up selling shovels after the gold has run out, but no one knows that until hindsight. There will probably be a couple scares that the gold has run out first to. And again the difference is only visible in hindsight.

[+] unknown|1 year ago|reply

[deleted]

[+] medion|1 year ago|reply

Can anyone comment on why Wenfeng shared his secret sauce? Other than publicity, there only seems to be downsides for him, as now everyone else with larger compute will just copy and improve?

[+] seydor|1 year ago|reply

Is AI expanding horizontally or vertically? My understanding is that smarter models dominate over hordes of dumber ones

[+] culi|1 year ago|reply

Yeah but NVIDIA's amazing digging technique that could only be accomplished with NVIDIA shovels is now irrelevant. Meaning there are more people selling shovels for the gold rush

[+] eigenvalue|1 year ago|reply

90% of the comments in this thread make it clear that knowing about technology does not in any way qualify someone to think correctly about markets and equity valuations.

[+] Sol-|1 year ago|reply

I find it interesting because the DeepSeek stuff, while very cool, doesn't seem invalidate that more compute wouldn't translate to even _higher_ capabilities?

It's amazing what they did with a limited budget, but instead of the takeaway being "we don't need that much compute to achieve X", it could also be, "These new results show that we can achieve even 1000*X with our currently planned compute buildout"

But perhaps the idea is more like: "We already have more AI capabilities than we know how to integrate into the economy for the time being" and if that's the hypothesis, then the availability of something this cheap would change the equation somewhat and possibly justify investing less money in more compute.

[+] kd913|1 year ago|reply

The biggest discussion I have been on having this is the implications on Deepseek for say the RoI H100. Will a sudden spike in available GPUs and reduction in demand (from efficient GPU usage) dramatically shock the cost per hour to rent a GPU. This I think is the critical value for measuring the investment value for Blackwell now.

The price for a H100 per hour has gone from the peak of $8.42 to about $1.80.

A H100 consumes 700W, lets say $0.10 per kwh?

A H100 costs around $30000.

Given deepseek, can the price of this drop further given a much larger supply of available GPUs can now be proven to be unlocked (Mi300x, H200s, H800s etc...).

Now that LLMs have effectively become commodity, with a significant price floor, is this new value ahead of what is profitable for the card.

Given the new Blackwell is $70000, is there sufficient applications that enable customers to get a RoI on the new card?

Am curious about this as I think I am currently ignorant of the types of applications that businesses can use to outweigh the costs. I predict that the cost per hour of the GPU dropping such that it isn't such a no-brainer investment compared to previously. Especially if it is now possible to unlock potential from much older platforms running at lower electricity rates.

[+] plaidfuji|1 year ago|reply

The part of this that doesn’t jibe with me is the fact that they also released this incredibly detailed technical report on their architecture and training strategy. The paper is well-written and has a lot of specifics. Exactly the opposite of what you would do if you had truly made an advancement of world-altering magnitude. All this says to me is that the models themselves have very little intrinsic value / are highly fungible. The true value lies in the software interfaces to the models, and the ability to make it easy to plug your data into the models.

My guess is the consumer market will ultimately be won by 2-3 players that make the best app / interface and leverage some kind of network effect, and enterprise market will just be captured by the people who have the enterprise data, I.e. MSFT, AMZN, GOOG. Depending on just how impactful AI can be for consumers, this could upend Apple if a full mobile hardware+OS redesign is able to create a step change in seamlessness of UI. That seems to me to be the biggest unknown now - how will hardware and devices adapt?

NVDA will still do quite well because as others have noted, if it’s cheaper to train, the balance will just shift toward deploying more edge devices for inference, which is necessary to realize the value built up in the bubble anyway. Some day the compute will become more fungible but the momentum behind the nvidia ecosystem is way too strong right now.

[+] skizm|1 year ago|reply

I like the chart Bloomberg has of the top 10 largest single day stock drops in history. 8 out of the 10 are NVDA (Meta and Amazon are the other two).

[+] yalogin|1 year ago|reply

Nvidia has gotten lucky repeatedly. The GPUs were great for PC gaming and they were the top dog. The crypto boom was such an unexpected win for them partly because Intel killed off their competition by acquiring it. Then the AI boom is also a direct result of Intel killing off their competition but the acquisition is too far removed to credit it to that event.

Unlike the crypto boom though, two factors make me think the AI thing was bound to go away quickly.

Unlike crypto there is no mathematical lower bound for computation, and if you see technology's history we can tell the models are going to get better/smaller/faster overtime reducing our reliance on the GPU.

Crypto was fringe but AI is fundamental to every software stack and every company. There is way too much money in this to just let Nvidia take it all. One way or another the reliance on it will be reduced

[+] junon|1 year ago|reply

Not sure what the fuss is. I tried Deepseek earlier today for the first time and it was even worse than o1 when it came to reasoning skills and following my requests for how I wanted to engage with it.

o1 at least gives it to me straight. When I ask it to engage in more back and forth before assuming what I'm after, it tends to follow through. Deepseek seemed immediately eager to (very slowly) feed me a bunch of made up information thinking that's what I wanted.

I feel as though a lot of people get hung up on these sort of "micro benchmarks" whereas trying to get practical work done is severely under tested. I'm not a fan of openai at all but I don't have the spare compute to run anything locally so o1 suffices for now.

Still don't see how this is anything but a win for Nvidia though.

[+] pizza|1 year ago|reply

Traders are saying not doing multitoken prediction, not using Sharpe ratio adjusted rewards, using reward models, and not compressing KV cache tokens by >90%, were supposed to be worth hundreds of billions of dollars of future expected revenue flow, at least according to other traders.

I say to the traders: you should have just stuck to reading arxiv, TPOT, and jhana twitter for the past 2 years, rather than listening to other traders, if you were trying to understand the utter spread of low hanging fruit that just hasn’t been picked up yet!

[+] nabla9|1 year ago|reply

Nvidia -13% in Frankfurt stock market just now.

Valuations of private unicorns like OpenAi and Anthropic must be in free fall. DeepSeek spends $6 million in old H800 hardware to develop open source model that overtakes ChatGPT. AI gets better, but profit margins sink with strong competition.

Chinese AI startup DeepSeek overtakes ChatGPT on Apple App Store https://news.ycombinator.com/item?id=42839656

Edit: Nvidia now -15% in Frankfurt.

[+] esperent|1 year ago|reply

> DeepSeek spends $6 million in old H800 hardware to develop open source model that overtakes ChatGPT.

DeepSeek claims that's what they spent. They're under a trade embargo, and if they had access to any more than that it would have been obtained illegally.

They might be telling the truth, but let's wait until someone else replicates it before we fully accept it.

[+] impossiblefork|1 year ago|reply

It's a very strange result.

I believe that NVIDIA is overvalued, but if DeepSeek really is as great as has been said, then it'll be even greater when scaled up to OpenAI sizes, and when you get more out you have more reason to pay, so this should if it pans out lead to more demand for GPUs-- basically Jevon's paradox.

[+] chad1n|1 year ago|reply

This is not exactly right, they said they spent $6M on training V3, there aren't numbers out there related to the training of R1, I can feel it will be cheaper than o1, but it's hard to tell how much cheaper. I can guess that overall deepseek spent way less than openai to release the model, because I have the feeling that the R&D part was cheaper too, but we don't have the numbers yet. Anyway, we can assume that deepseek and Alibaba will try to get the most out of their current GPUs however.

[+] negamax|1 year ago|reply

The bigger correction will be in tech stocks that are overly exposed to datacenter investments to accommodate for ever rising AI demands. MSFT, AMZN, META they are all exposed

[+] nolok|1 year ago|reply

They were massively overhyped though, it feels more like a correction (and a partial one at that) than a fall.

[+] scrlk|1 year ago|reply

Not a good day for those who decided to hold 3x Nvidia ETPs - down 40% earlier.

[+] gostsamo|1 year ago|reply

Consider that the chinese might be misrepresenting their costs. A newsletter was implying that they might do it to undermine the sanctions justifications.

Agree that the AI bubble should pop though and the earlier, the better.

[+] qwytw|1 year ago|reply

> overtakes ChatGPT

That's arguable, though. I mean it's much cheaper and reasonably competitive which is almost the same but IMHO DeepSeek seems to get stuck in random loops and hallucinates more frequently than o1.

[+] paxys|1 year ago|reply

IMO this is less about DeepSeek and more that Nvidia is essentially a bubble/meme stock that is divorced from the reality of finance and business. People/institutions who bought on nothing but hype are now panic selling. DeepSeek provided the spark, but that's all that was needed, just like how a vague rumor is enough to cause bank runs.

[+] h1h1hh1h1h1|1 year ago|reply

So the Chinese graciously gift a paper and model which describes methods that radically increase the efficiency of hardware which will allow US AI firms to create much better models due to having significantly more AI hardware and people are bearish on US AI now?

[+] bee_rider|1 year ago|reply

ASML plunge indicates a hysterical/irrational component to the response, right? They aren’t going anywhere. If it turns out training is easier than expected, they make the devices that make the devices that do inference too…

If the field is going to produce anything useful, cheap training gets us there faster.

[+] Too|1 year ago|reply

Jeez chill... it’s just back to where it was 4 months ago and even after the drop it is still up 100% compared to this time last year! And it’s all fake inflated money.

This unprecedented growth simply couldn’t continue forever.

[+] Kon-Peki|1 year ago|reply

If you look at total volume of shares traded, this would be somewhere in the range of 200th highest.

If you look at the total monetary value of those shares traded, this would be in the top 5, all of which have happened in the past 5 years. #1 is probably Tesla on Dec 18 2020 (right before it joined the S&P500). It lost ~6% that day.

Don’t get me wrong, this is definitely a big day. Just not “lose your mind” big. It’s clear that most shareholders just sat things out.

[+] ezoe|1 year ago|reply

I really don't understand the market thinks Nvidia is losing its value.

If DeepSeek reduce the required computational resources, we can pour more computational resources to improve it further. There's nothing bad about more resources.

[+] t_mann|1 year ago|reply

Curious thought: could those large price movements have something to do with the fact that DeepSeek is financed by a hedge fund (rather than the more typical VC)? It is unclear how DS will make money from its current strategy of sharing much of the secret sauce that went into training as well as releasing the results under permissive licenses. But if the play was "short major tech stocks and then release surprising results in a way that maximally undermines their current growth story", then it could make a lot more sense.

[+] reacharavindh|1 year ago|reply

What I’d like to know is.. If a good model can be trained with much fewer GPUs using a breakthrough technique, can the breakthrough technique be used by OpenAI, MSFT et al who has loads of GPUs to train a model that is orders of magnitude better than their state of the art today?

We’ve been getting the impression that the limiting factor was the number of GPUs right? If so, this reduces that bottleneck and frees them up to do even better right?

[+] jmward01|1 year ago|reply

So, what are investors thinking to warrant this? If it is 'DeepSeek means you don't need the compute' that is definitely wrong. Making a more efficient x almost always leads to more of x being sold/used, not less. In the long term does anyone believe we won't keep needing more compute and not less?

I think the market believes that high end compute is not needed anymore so the stuff in datacenters suddenly just became 10x over-provisioned and it will take a while to fill up that capacity. Additionally, things like the mac and AMD unified memory architectures and consumer GPUs are all now suddenly able to run SOTA models locally. So a triple whammy. The competition just caught up, demand is about to drop in the short term for any datacenter compute and the market for exotic, high margin, GPUs might have just evaporated. At least that is what I think the market is thinking. I personally believe this is a short term correction since the long term demand is still there and we will keep wanting more big compute for a long time.

1073 comments