top | item 42822162

The impact of competition and DeepSeek on Nvidia

655 points| eigenvalue | 1 year ago |youtubetranscriptoptimizer.com | reply

479 comments

order
[+] pjdesno|1 year ago|reply
The description of DeepSeek reminds me of my experience in networking in the late 80s - early 90s.

Back then a really big motivator for Asynchronous Transfer Mode (ATM) and fiber-to-the-home was the promise of video on demand, which was a huge market in comparison to the Internet of the day. Just about all the work in this area ignored the potential of advanced video coding algorithms, and assumed that broadcast TV-quality video would require about 50x more bandwidth than today's SD Netflix videos, and 6x more than 4K.

What made video on the Internet possible wasn't a faster Internet, although the 10-20x increase every decade certainly helped - it was smarter algorithms that used orders of magnitude less bandwidth. In the case of AI, GPUs keep getting faster, but it's going to take a hell of a long time to achieve a 10x improvement in performance per cm^2 of silicon. Vastly improved training/inference algorithms may or may not be possible (DeepSeek seems to indicate the answer is "may") but there's no physical limit preventing them from being discovered, and the disruption when someone invents a new algorithm can be nearly immediate.

[+] AlanYx|1 year ago|reply
Another aspect that reinforces your point is that the ATM push (and subsequent downfall) was not just bandwidth-motivated but also motivated by a belief that ATM's QoS guarantees were necessary. But it turned out that software improvements, notably MPLS to handle QoS, were all that was needed.
[+] accra4rx|1 year ago|reply
Love those analogies . This is one of main reason I love hacker news / reddit . Honest golden experiences
[+] vFunct|1 year ago|reply
I worked on a network that used a protocol very similar to ATM (actually it was the first Iridium satellite network). An internet based on ATM would have been amazing. You’re basically guaranteeing a virtual switched circuit, instead of the packets we have today. The horror of packet switching is all the buffering it needs, since it doesn’t guarantee circuits.

Bandwidth is one thing, but the real benefit is that ATM also guaranteed minimal latencies. You could now shave off another 20-100ms of latency for your FaceTime calls, which is subtle but game changing. Just instant-on high def video communications, as if it were on closed circuits to the next room.

For the same reasons, the AI analogy could benefit from both huge processing as well as stronger algorithms.

[+] aurareturn|1 year ago|reply
Doesn’t your point about video compression tech support Nvidia’s bull case?

Better video compression led to an explosion in video consumption on the Internet, leading to much more revenue for companies like Comcast, Google, T-Mobile, Verizon, etc.

More efficient LLMs lead to much more AI usage. Nvidia, TSMC, etc will benefit.

[+] TheCondor|1 year ago|reply
It seems more stark even. The energy costs that are current and then projected for AI are staggering. At the same time, I think it has been MS that has been publishing papers on LLMs that are smaller (so called small language models) but more targeted and still achieving a fairly high "accuracy rate."

Didn't TMSC say that SamA came for a visit and said they needed $7T in investment to keep up with the pending demand needs.

This stuff is all super cool and fun to play with, I'm not a nay sayer but it almost feels like these current models are "bubble sort" and who knows how it will look if "quicksort" for them becomes invented.

[+] TMWNN|1 year ago|reply
>but there's no physical limit preventing them from being discovered, and the disruption when someone invents a new algorithm can be nearly immediate.

The rise of the net is Jevons paradox fulfilled. The orders of magnitude less bandwidth needed per cat video drove much more than that in overall growth in demand for said videos. During the dotcom bubble's collapse, bandwidth use kept going up.

Even if there is a near-term bear case for NVDA (dotcom bubble/bust), history indicates a bull case for the sector overall and related investments such as utilities (the entire history of the tech sector from 1995 to today).

[+] lokar|1 year ago|reply
Another example: people like to cite how the people who really made money in the CA gold rush were selling picks and shovels.

That only lasted so long. Then it was heavy machinery (hydraulics, excavators, etc)

[+] tuna74|1 year ago|reply
I always like the "look" of high bit rate Mpeg2 video. Download HD japanese TV content from 2005-2010 and it still looks really good.
[+] paulddraper|1 year ago|reply
I love algorithms as much the next guy, but not really.

DCT was developed in 1972 and has a compression ratio of 100:1.

H.264 compresses 2000:1.

And standard resolution (480p) is ~1/30th the resolution of 4k.

---

I.e. Standard resolution with DCT is smaller than 4k with H.264.

Even high-definition (720p) with DCT is only twice the bandwidth of 4k H.264.

Modern compression has allowed us to add a bunch more pixels, but it was hardly a requirement for internet video.

[+] breadwinner|1 year ago|reply
Great article but it seems to have a fatal flaw.

As pointed out in the article, Nvidia has several advantages including:

   - Better Linux drivers than AMD
   - CUDA
   - pytorch is optimized for Nvidia
   - High-speed interconnect
Each of the advantages is under attack:

   - George Hotz is making better drivers for AMD
   - MLX, Triton, JAX: Higher level abstractions that compile down to CUDA
   - Cerbras and Groq solve the interconnect problem
The article concludes that NVIDIA faces an unprecedented convergence of competitive threats. The flaw in the analysis is that these threats are not unified. Any serious competitor must address ALL of Nvidia's advantages. Instead Nvidia is being attacked by multiple disconnected competitors, and each of those competitors is only attacking one Nvidia advantage at a time. Even if each of those attacks are individually successful, Nvidia will remain the only company that has ALL of the advantages.
[+] toisanji|1 year ago|reply
I want the NVIDIA monopoly to end, but there is no real competition still. * George Hotz has basically given up on AMD: https://x.com/__tinygrad__/status/1770151484363354195

* Groq can't produce more hardware past their "demo". It seems like they haven't grown capacity in the years since they announced, and they switched to a complete SaaS model and don't even sell hardware anymore.

* I dont know enough about MLX, Triton, and JAX,

[+] epolanski|1 year ago|reply
> Any serious competitor must address ALL of Nvidia's advantages.

Not really, his article focuses on Nvidia's being valued so highly by stock markets, he's not saying that Nvidia's destined to lose its advantage in the space in the short term.

In any case, I also think that the likes of MSFT/AMZN/etc will be able to reduce their capex spending eventually by being able to work on a well integrated stack on their own.

[+] dralley|1 year ago|reply
>So how is this possible? Well, the main reasons have to do with software— better drivers that "just work" on Linux and which are highly battle-tested and reliable (unlike AMD, which is notorious for the low quality and instability of their Linux drivers)

This does not match my experience from the past ~6 years of using AMD graphics on Linux. Maybe things are different with AI/Compute, I've never messed with that, but in terms of normal consumer stuff the experience of using AMD is vastly superior than trying to deal with Nvidia's out-of-tree drivers.

[+] Herring|1 year ago|reply
He's setting up a case for shorting the stock, ie if the growth or margins drop a little from any of these (often well-funded) threats. The accuracy of the article is a function of the current valuation.
[+] csomar|1 year ago|reply
> - Better Linux drivers than AMD

Unless something radically changed in the last couple years, I am not sure where you got this from? (I am specifically talking about GPUs for computer usage rather than training/inference)

[+] litigator|1 year ago|reply
Check out Anthonix on Twitter. He's already done what George Hotz is trying to do and he did it months ago. He's moved on from the RX 7900 XTX to MI300X and is setting some records. He had to write the majority of the code by himself but kept some of ROCm he deemed fit. He is always stirring George up when he has his AMD tantrums. Seriously though, how bad are AMD engineers if one person in their free time can make a custom stack that out performs ROCm.
[+] aorloff|1 year ago|reply
The unification of the flaws is the scarcity of H100s

He says this and talks about it in The Fallout section - even at BigCos with megabucks the teams are starved for time on the Nvidia chips and if these innovations work other teams will use them and then boom Nvidia's moat is truncated somehow which doesn't look good at such lofty multiples

[+] isatty|1 year ago|reply
Sorry, I don’t know who George Hotz is, but why isn’t AMD making better drivers for AMD?
[+] slightwinder|1 year ago|reply
> - Better Linux drivers than AMD

In which way? As a user who switched from an AMD-GPU to Nvidia-GPU, I can only report a continued amount of problems with NVIDIAs proprietary driver, and none with AMD. Is this maybe about the open source-drivers or usage for AI?

[+] latchkey|1 year ago|reply
George is writing software to directly talk to consumer AMD hardware, so that he can sell more Tinyboxes. He won't be doing that for enterprise.

Cerbras and Groq need to solve the memory problem. They can't scale without adding 10x the hardware.

[+] thousand_nights|1 year ago|reply
> George Hotz is making better drivers for AMD

lol

[+] willvarfar|1 year ago|reply
A new entrant, with an order of magnitude advantage in e.g. cost or availability or exportability, can succeed even with poor drivers and no CUDA etc. Its only when you cost nearly as much as NVidia that the tooling costs become relevant.
[+] queuebert|1 year ago|reply
Don't forget they bought Mellanox and have their own HBA and switch business.
[+] grajaganDev|1 year ago|reply
There is not enough water (to cool data centers) to justify NVDA's current valuation.

The same is true of electricity - neither nuclear power nor fusion will not be online anytime soon.

[+] yapyap|1 year ago|reply
Geohot still at it?

goat.

[+] fairity|1 year ago|reply
DeepSeek just further reinforces the idea that there is a first-move disadvantage in developing AI models.

When someone can replicate your model for 5% of the cost in 2 years, I can only see 2 rational decisions:

1) Start focusing on cost efficiency today to reduce the advantage of the second mover (i.e. trade growth for profitability)

2) Figure out how to build a real competitive moat through one or more of the following: economies of scale, network effects, regulatory capture

On the second point, it seems to me like the only realistic strategy for companies like OpenAI is to turn themselves into a platform that benefits from direct network effects. Whether that's actually feasible is another question.

[+] aurareturn|1 year ago|reply
This is wrong. First mover advantage is strong. This is why OpenAI is much bigger than Mixtral despite what you said.

First mover advantage acquired and keeps subscribers.

No one really cares if you matched GPT4o one year later. OpenAI has had a full year to optimize the model, build tools around the model, and used the model to generate better data for their next generation foundational model.

[+] Mistletoe|1 year ago|reply
I feel like AI tech just reverse scales and reverse flywheels, unlike the tech giant walls and moats now, and I think that is wonderful. OpenAI has really never made sense from a financial standpoint and that is healthier for humans. There’s no network effect because there’s no social aspect to AI chatbots. I can hop on DeepSeek from Google Gemini or OpenAI at ease because I don’t have to have friends there and/or convince them to move. AI is going to be a race to the bottom that keeps prices low to zero. In fact I don’t know how they are going to monetize it at all.
[+] tw1984|1 year ago|reply
> DeepSeek just further reinforces the idea that there is a first-move disadvantage in developing AI models.

you are assuming that what DeepSeek achieved can be reasonably easily replicated by other companies. then the question is when all big techs and tons of startups in China and the US are involved, how come none of those companies succeeded?

deepseek is unique.

[+] boringg|1 year ago|reply
Your making some big assumptions projecting into the future. One that deepseek takes market position, two that the information they have released is honest regarding training usage, spend etc.

Theres a lot more still to unpack and I don’t expect this to stay solely in the tech realm. Seems to politically sensitive.

[+] UncleOxidant|1 year ago|reply
Even if DeepSeek has figured out how to do more (or at least as much) with less, doesn't the Jevons Paradox come into play? GPU sales would actually increase because even smaller companies would get the idea that they can compete in a space that only 6 months ago we assumed would be the realm of the large mega tech companies (the Metas, Googles, OpenAIs) since the small players couldn't afford to compete. Now that story is in question since DeepSeek only has ~200 employees and claims to be able to train a competitive model for about 20X less than the big boys spend.
[+] colinnordin|1 year ago|reply
Great article.

>Now, you still want to train the best model you can by cleverly leveraging as much compute as you can and as many trillion tokens of high quality training data as possible, but that's just the beginning of the story in this new world; now, you could easily use incredibly huge amounts of compute just to do inference from these models at a very high level of confidence or when trying to solve extremely tough problems that require "genius level" reasoning to avoid all the potential pitfalls that would lead a regular LLM astray.

I think this is the most interesting part. We always knew a huge fraction of the compute would be on inference rather than training, but it feels like the newest developments is pushing this even further towards inference.

Combine that with the fact that you can run the full R1 (680B) distributed on 3 consumer computers [1].

If most of NVIDIAs moat is in being able to efficiently interconnect thousands of GPUs, what happens when that is only important to a small fraction of the overall AI compute?

[1]: https://x.com/awnihannun/status/1883276535643455790

[+] simonw|1 year ago|reply
This is excellent writing.

Even if you have no interest at all in stock market shorting strategies there is plenty of meaty technical content in here, including some of the clearest summaries I've seen anywhere of the interesting ideas from the DeepSeek v3 and R1 papers.

[+] andrewgross|1 year ago|reply
> The beauty of the MOE model approach is that you can decompose the big model into a collection of smaller models that each know different, non-overlapping (at least fully) pieces of knowledge.

I was under the impression that this was not how MoE models work. They are not a collection of independent models, but instead a way of routing to a subset of active parameters at each layer. There is no "expert" that is loaded or unloaded per question. All of the weights are loaded in VRAM, its just a matter of which are actually loaded to the registers for calculation. As far as I could tell from the Deepseek v3/v2 papers, their MoE approach follows this instead of being an explicit collection of experts. If thats the case, theres no VRAM saving to be had using an MOE nor an ability to extract the weights of the expert to run locally (aside from distillation or similar).

If there is someone more versed on the construction of MoE architectures I would love some help understanding what I missed here.

[+] j7ake|1 year ago|reply
This was an amazing summary of the landscape of ML currently.

I think the title does the article injustice, or maybe it’s too long for people to read to appreciate it (eg the deepseek stuff can be an article within itself).

Whatever the ones with longer attention span will benefit from this read.

Thanks for summarising this up!

[+] lxgr|1 year ago|reply
Man, do I love myself a deep, well-researched long-form contrarian analysis published as a tangent of an already niche blog on a Sunday evening! The old web isn't dead yet :)
[+] liuliu|1 year ago|reply
This is a humble and informed acrticle (comparing to others written by financial analysts the past a few days). But still have the flaw of over-estimating efficiency of deploying a 687B MoE model on commodity hardware (to use locally, cloud providers will do efficient batching and it is different): you cannot do that on any single Apple hardware (need to at least hook up 2 M2 Ultra). You can barely deploy that on desktop computers just because non-register DDR5 can have 64GiB per stick (so you are safe with 512 RAM). Now coming to PCIe bandwidth: 37B per token activation means exactly that, each activation requires new set of 37B weights, so you need to transfer 18GiB per token into VRAM (assuming 4-bit quant). PCIe 5 (5090) have 64GB/s transfer speed so your upper bound is limited to 4 tok/s with a well balanced propose built PC (and custom software). For programming tasks that usually requires ~3000 tokens for thinking, we are looking at 12 mins per interaction.
[+] hn_throwaway_99|1 year ago|reply
I'm curious if someone more informed than me can comment on this part:

> Besides things like the rise of humanoid robots, which I suspect is going to take most people by surprise when they are rapidly able to perform a huge number of tasks that currently require an unskilled (or even skilled) human worker (e.g., doing laundry ...

I've always said that the real test for humanoid AI is folding laundry, because it's an incredibly difficult problem. And I'm not talking about giving a machine clothing piece-by-piece flattened so it just has to fold, I'm talking about saying to a robot "There's a dryer full of clothes. Go fold it into separate piles (e.g. underwear, tops, bottoms) and don't mix the husband's clothes with the wife's". That is, something most humans in the developed world have to do a couple times a week.

I've been following some of the big advances in humanoid robot AI, but the above task still seems miles away given current tech. So is the author's quote just more unsubstantiated hype that I'm constantly bombarded with in the AI space, or have there been advancements recently in robot AI that I'm unaware of?

[+] brandonpelfrey|1 year ago|reply
Great article. I still feel like very few people are viewing the Deepseek effects in the right light. If we are 10x more efficient it's not that we use 1/10th the resources we did before, we expand to have 10x the usage we did before. All technology products have moved this direction. Where there is capacity, we will use it. This argument would not work if we were close to AGI or something and didn't need more, but I don't think we're actually close to that at all.
[+] skizm|1 year ago|reply
I'm wondering if there's a (probably illegal) strategy in the making here:

    - Wait till NVDA rebounds in price.
    - Create an OpenAI "competitor" that is powered by Llama or a similar open weights model.
    - Obscure the fact that the company runs on this open tech and make it seem like you've developed your own models, but don't outright lie.
    - Release an app and whitepaper (whitepaper looks and sounds technical, but is incredibly light on details, you only need to fool some new-grad stock analysts).
    - Pay some shady click farms to get your app to the top of Apples charts (you only need it to be there for like 24 hours tops).
    - Collect profits from your NVDA short positions.
[+] snowmaker|1 year ago|reply
This is an excellent article, basically a patio11 / matt levine level breakdown of what's happening with the GPU market.
[+] naiv|1 year ago|reply
I used to own several adult companies in the past. Incredible huge margins and then along came Pornhub and we could barely survive after it as we did not adapt.

With Deepseek this is now the 'Pornhub of AI' moment. Adapt or die.

[+] typeofhuman|1 year ago|reply
I'm rooting for DeepSeek (or any competitor) against OpenAI because I don't like Sam Altman. I'm confident in admitting it.
[+] pavelstoev|1 year ago|reply
English economist William Stanley Jevons vs the author of the article.

Will NVIDIA be in trouble because of DSR1 ? Interpreting Jevon’s effect, if LLMs are “steam engines” and DSR1 brings 90% efficiency improvement for the same performance, more of it will be deployed. This is not considering the increase due to <think> tokens.

More NVIDIA GPUs will be sold to support growing use cases of more efficient LLMs.

[+] chvid|1 year ago|reply
For sure NVIDIA is priced for perfection perhaps more than any of the other of similar market value.

I think two threats are the biggest:

First Apple. TSMC’s largest customer. They are already making their own GPUs for their data centers. If they were to sell these to others they would be a major competitor.

You would have the same GPU stack on your on phone, laptop, pc, and data center. Already big developer mind share. Also useful in a world where LLMs run (in part) on the end user’s local machine (like Apple Intelligence).

Second is China - Huawei, Deepseek etc.

Yes - there will be no GPUs from Huawei in the US in this decade. And the Chinese won’t win in a big massive battle. Rather it is going to be death by a thousand cuts.

Just as what happened with the Huawei Mate 60. It is only sold in China but today Apple is loosing business big time in China.

In the same manner OpenAi and Microsoft will have their business hurt by Deepseek even if Deepseek was completely banned in the west.

Likely we will see news on Chinese AI accelerators this year and I wouldn’t be surprised if we soon saw Chinese hyperscalars offering cheaper GPU cloud compute than the west due to a combination of cheaper energy, labor cost, and sheer scale.

Lastly AMD is no threat to NVIDIA as they are far behind and follow the same path with little way of differentiating themselves.

[+] mgraczyk|1 year ago|reply
The beginning of the article was good, but the analysis of DeepSeek and what it means for Nvidia is confused and clearly out of the loop.

  * People have been training models at <fp32 precision for many years, I did this in 2021 and it was already easy in all the major libraries.
  * GPU FLOPs are used for many things besides training the final released model.
  * Demand for AI is capacity limited, so it's possible and likely that increasing AI/FLOP would not substantially reduce the price of GPUs