top | item 42841226

(no title)

colinnordin | 1 year ago

Great article.

>Now, you still want to train the best model you can by cleverly leveraging as much compute as you can and as many trillion tokens of high quality training data as possible, but that's just the beginning of the story in this new world; now, you could easily use incredibly huge amounts of compute just to do inference from these models at a very high level of confidence or when trying to solve extremely tough problems that require "genius level" reasoning to avoid all the potential pitfalls that would lead a regular LLM astray.

I think this is the most interesting part. We always knew a huge fraction of the compute would be on inference rather than training, but it feels like the newest developments is pushing this even further towards inference.

Combine that with the fact that you can run the full R1 (680B) distributed on 3 consumer computers [1].

If most of NVIDIAs moat is in being able to efficiently interconnect thousands of GPUs, what happens when that is only important to a small fraction of the overall AI compute?

[1]: https://x.com/awnihannun/status/1883276535643455790

discuss

order

tomrod|1 year ago

Conversely, how much larger can you scale if frontier models only currently need 3 consumer computers?

Imagine having 300. Could you build even better models? Is DeepSeek the right team to deliver that, or can OpenAI, Meta, HF, etc. adapt?

Going to be an interesting few months on the market. I think OpenAI lost a LOT in the board fiasco. I am bullish on HF. I anticipate Meta will lose folks to brain drain in response to management equivocation around company values. I don't put much stock into Google or Microsoft's AI capabilities, they are the new IBMs and are no longer innovating except at obvious margins.

stormfather|1 year ago

Google is silently catching up fast with Gemini. They're also pursuing next gen architectures like Titan. But most importantly, the frontier of AI capabilities is shifting towards using RL at inference (thinking) time to perform tasks. Who has more data than Google there? They have a gargantuan database of queries paired with subsequent web nav, actions, follow up queries etc. Nobody can recreate this, Bing failed to get enough marketshare. Also, when you think of RL talent, which company comes to mind? I think Google has everyone checkmated already.

onlyrealcuzzo|1 year ago

If you watch this video, it explains well what the major difference is between DeepSeek and existing LLMs: https://www.youtube.com/watch?v=DCqqCLlsIBU

It seems like there is MUCH to gain by migrating to this approach - and it theoretically should not cost more to switch to that approach than vs the rewards to reap.

I expect all the major players are already working full-steam to incorporate this into their stacks as quickly as possible.

IMO, this seems incredibly bad to Nvidia, and incredibly good to everyone else.

I don't think this seems particularly bad for ChatGPT. They've built a strong brand. This should just help them reduce - by far - one of their largest expenses.

They'll have a slight disadvantage to say Google - who can much more easily switch from GPU to CPU. ChatGPT could have some growing pains there. Google would not.

danaris|1 year ago

This assumes no (or very small) diminishing returns effect.

I don't pretend to know much about the minutiae of LLM training, but it wouldn't surprise me at all if throwing massively more GPUs at this particular training paradigm only produces marginal increases in output quality.

simpaticoder|1 year ago

>Imagine having 300.

Would it not be useful to have multiple independent AIs observing and interacting to build a model of the world? I'm thinking something roughly like the "councelors" in the Civilization games, giving defense/economic/cultural advice, but generalized over any goal-oriented scenario (and including one to take the "user" role). A group of AIs with specific roles interacting with each other seems like a good area to explore, especially now given the downward scalability of LLMs.

tw1984|1 year ago

> If most of NVIDIAs moat is in being able to efficiently interconnect thousands of GPUs

nah. it moat is CUDA and millions of devs using CUDA aka the ecosystem

mupuff1234|1 year ago

But if it's not combined with super high end chips with massive margins that moat is not worth anywhere close to 3T USD.

ReptileMan|1 year ago

And then some chineese startup create an amazing compiler that takes cuda and moves it to X (AMD, Intel, Asic) and we are back at square one.

So far it seems that the best investment is in RAM producers. Unlike compute the ram requirements seem to be stubborn.

a_wild_dandan|1 year ago

Running a 680-billion parameter frontier model on a few Macs (at 13 tok/s!) is nuts. That'a two years after ChatGPT was released. That rate of progress just blows my mind.

qingcharles|1 year ago

And those are M2 Ultras. M4 Ultra is about to drop in the next few weeks/months, and I'm guessing it might have higher RAM configs, so you can probably run the same 680b on two of those beasts.

The higher performing chips, with one less interconnect, is going to give you significantly higher t/s.

arresin|1 year ago

Link has all the params but running at 4 bit quant.

qingcharles|1 year ago

4-bit quant is generally kinda low, right?

I wonder how badly this quant affects the output on DeepSeek?

neuronic|1 year ago

> NVIDIAs moat

Offtopic, but your comment finally pushed me over the edge to semantic satiation [1] regarding the word "moat". It is incredible how this word turned up a short while ago and now it seems to be a key ingredient of every second comment.

[1] https://en.wikipedia.org/wiki/Semantic_satiation

mikestew|1 year ago

It is incredible how this word turned up a short while ago…

I’m sure if I looked, I could find quotes from Warren Buffet (the recognized originator of the term) going back a few decades. But your point stands.

ljw1004|1 year ago

I'm struggling to understand how a moat can have a CRACK in it.