top | item 42557586

Deepseek: The quiet giant leading China’s AI race

488 points| sunny-beast | 1 year ago |chinatalk.media

446 comments

[+] lomkju|1 year ago|reply

I feel the GPU restrictions created an environment for Chinese Devs to be more innovative and do more with less.

Kudos to the deepseek team!

[+] nsoonhui|1 year ago|reply

I find that the gushing around deepseek is fascinating to watch.

To me there are a few structural and fundamental reasons why deepseek can never outperform other models by a wide margin. On par maybe--as we reach the diminishing returns with our investment in the models, but not win by a wide margin.

1. The US trade war with china which will place deepseek compute availability at disadvantages, eventually, if we ever get to that.

2. China censorship which limits the deepseek data ingestion and output, to some degree.

3. Most importantly, deepseek is open source, which means that the other models are free to copy whatever secret source it has, eg: Whatever architecture that purportedly use less compute can easily be copied.

I've been using Gemini, chatgpt, deepseek and Claudie on regular basis. Deepseek is neither better or worse than others. But this says more about my own limited usage of LLM rather than the usefulness of the models.

I want to know exactly what makes everyone thinks that deepseek totally owns the LLM space? Do I miss anything?

PS: I am a Malaysian Chinese, so I am certainly not "a westerner who is jealous and fearful of the rise of China"

[+] jdietrich|1 year ago|reply

I don't think it's necessarily about DeepSeek, but about the wider competitive picture. There are two tacit assumptions being made about LLMs - that having a SOTA model is a substantial competitive advantage, and that the demand for compute will continue to grow rapidly.

DeepSeek's phenomenal success in reducing training and inference cost points to the possibility of a very different future. If it's the case that SOTA or near-SOTA performance is commoditised and progress in efficiency outpaces progress in capability, then the roadmap looks radically different. If DeepSeek don't have a competitive advantage, then no-one has a competitive advantage. Having a DC full of H200s or a proprietary model with a trillion parameters might not count for anything, in which case we're looking at a very different set of winners and losers. Application specific fine-tuning and product-market fit might matter much more than brute force compute.

[+] antirez|1 year ago|reply

1. The Chinese internal market is huge, and in case they develop models that are better than western models, not using them will be a disadvantage for us, not them. Also I can see many European countries (including my country, Italy) to buy Chinese AI regardless of US regulations.

2. Western has its own issues with data limits and extreme alignment that makes models dumber. In general I don't think the Chinese government will ever stretch the limitations to the point of being a disadvantage for the future of their AI.

3. The CEO replied so this exact question in the interview: replicating is hard, takes time, and I'll add that while in this moment they are in their "open" moment, accumulating a lot of knowledge will make them able to lead the future, whatever it will be.

Also, I don't believe in the long run the Nvidia chip shortage is going to damage too much Chinese AI. Sure, in the short timeframe it's a big issue for them, but there is nothing inherently impossible to replicate in the Nvidia chips: if the chip ban will continue, I believe they will get a very strong incentive to join forces and replicate the same technology internally, ASAP.

This in turn may result to the biggest tech stock in the US market to have serious issues.

[+] tossandthrow|1 year ago|reply

> ... why deepseek can never outperform ...

This read more like a "western supremacists" post.

1. Only until China produces more compute than the west.

2. You don't have to ask ChatGPT / Claude many questions before realizing the grave censorship these are under - DeepSeek has access the roughly the same corpus of data as their western counter parts.

3. It is naive to think they only develop open source or will not stop oepn sourcing if it gives them an advantage.

[+] viraptor|1 year ago|reply

> The US trade war with china which will place deepseek compute availability at disadvantages

Will it? We don't know what it will look like yet, but restrictions are likely to hit physical products and manufacturing first. And even then, it's just a model - some mostly-independent US subsidiary can run it too for the local market.

> China censorship which limits the deepseek data ingestion

Deepseek has been improving through training, architecture, and features. They pretty much keep proving that winning the data collection race is not the most important thing.

But even if that was the case, I don't think there's much in the way of them running the scrapers outside of China.

> Most importantly, deepseek is open source,

OpenAI relies on burning cash and creating huge, expensive models. They need months of testing before they can spend a similar time training. Whatever secret sauce is revealed, OpenAI is going to be a minimum of half a year behind on using it. (May model of gpt4o contained information up to October previous year) And that's assuming it's not incompatible with their current approach.

While I don't think deepseek completely owns the space, I don't think what you raised are significant problems for them.

[+] logicchains|1 year ago|reply

>I want to know exactly what makes everyone thinks that deepseek totally owns the LLM space?

It achieved competitive performance to the competition at literally 10x less cost of production (training). That's an incredible achievement in any industry, especially given they have such a small team relative to competitors. Their API is 20-50x cheaper than the competitors, and not because they're burning cash by charging less than costs, but rather because their architecture is just that much more efficient.

They already achieved the above in spite of sanctions limiting their availability to top-tier GPUs, and the gap between Chinese domestic GPUs and NVidia is getting smaller and smaller, so in future the GPU disadvantage will be less and less.

[+] HarHarVeryFunny|1 year ago|reply

> The US trade war with china which will place deepseek compute availability at disadvantages

I doubt it'll make much difference. Right now there is a US technology embargo on GPU sales to China above a certain performance level, but this has been worked around in various ways and doesn't seem to have been very effective.

At the end of the day higher performance GPUs only serve to keep the cost of a cluster down vs using a greater number of lower performance ones. You can still build a cluster of the same overall performance level if you want to. Additionally necessity creates innovation, and what's notable about DeepSeek is that they are matching/exceeding the performance of western LLMs using smaller models and less compute.

[+] suraci|1 year ago|reply

deepseek doesn't need to outperform other models, it just needs to be cheap, or, efficient

the cost of deepseek (if it's true) will disrupt the logic of current AI industry

The current AI industry is built on a financing bubble, where investors hand over money blindly without demanding that companies profit from AI. There is a consensus about AI: more money = more GPUstraning-time = more 'leading' model, It has become a situation where investors are effectively buying GPUstraining-time but not stocks/shares of profitable bussiness

deepseek will disrupt this value flow.

> Alibaba Cloud announced the third round of price cuts for its large models this year, with the visual understanding models of the General Qwen-VL models experiencing a price reduction of over 80% across the board. The Qwen-VL-Plus model saw a direct price drop of 81%, with the input cost being only 0.0015 yuan per thousand tokens, setting a record for the lowest price across the network. The higher-performance Qwen-VL-Max model was reduced to 0.003 yuan per thousand tokens, with a significant decrease of 85%. According to the latest prices, one yuan can process up to approximately 600 720P images or 1700 480P images.

[+] evanjrowley|1 year ago|reply

One advantage China has that you haven't mentioned is higher degrees of mandatory surveillance over a larger population [0]. Even if they never reach/surpass the west in AI compute power, there is greater potential for China to have more training data in long term to produce higher quality models. Chinese laws require data types and algorithms to be reported to the CCP government, which combined with authoritarian policies, gives the CCP far greater leverage in AI development strategy compared to any other entity[2]. From this perspective, growth in Chinese AI capability is not only a threat to US national interests, but also to the Chinese public itself.

Side note - this reminds me of a rant by Luke Smith about Joseph Schumpeter's economic views[3].

[0] https://theconversation.com/digital-surveillance-is-omnipres...

[1] https://carnegieendowment.org/posts/2022/12/what-chinas-algo...

[2] https://www.youtube.com/watch?v=SYUgTzT79ww

[+] csomar|1 year ago|reply

You are comparing apple to oranges. Claude is better, sure, and I'd probably use it over deepseek but deepseek is an open model. For me, this makes deepseek quite superior (not from a benchmark/output perspective) to all the other closed models.

[+] chvid|1 year ago|reply

As I understand deepseek has the best open source model at the moment by a fair margin. Disproving that a Chinese company cannot outperform western offerings due to censorship and compute constrains.

Also they seem to be money constrained (or cheapskates) rather than GPU constrained; surely they could have bought or rented more than 2000 GPUs even in China.

[+] iepathos|1 year ago|reply

I find the open source argument pretty weak. Linux is open source but is more used in production than windows, macos, or any other operating system by far and very arguably out-performs them. The very nature of being open source does not mean proprietary alternatives pick up all the benefits and being open source it is free and easily moddable which appeals to many of the best engineers who can drive the innovation further than proprietary alternatives. Proprietary alternatives don't necessarily have the resources or desire to adapt innovations from open source tech for their own solutions.

[+] littlestymaar|1 year ago|reply

> 3. Most importantly, deepseek is open source, which means that the other models are free to copy whatever secret source it has, eg: Whatever architecture that purportedly use less compute can easily be copied.

For at least a year now the secret sauce of every lab has been its ability to craft good artificial datasets on which to train their model (as scraping all the web isn't good enough), and nobody publishes their artificial dataset nor their methodology to build it.

[+] eunos|1 year ago|reply

> 1. The US trade war with china which will place deepseek compute availability at disadvantages, eventually, if we ever get to that.

Chinese chips will come soon, I heard on DeepSeek Huawei Ascend chips are already on part of inference.

> 2. China censorship which limits the deepseek data ingestion and output, to some degree.

There are things that deepseek doesnt censor but Claude does censor. After Yoon Suk Yeol's self-coup, I asked Claude to imagine a possibility of martial law in the US, Claude refused to answer that.

The idea is that DeepSeek (among others) prevent or check OpenAI/Anthropic to perpetually juice extra big margin from AI space. The current valuation of NVDA and downstream AI companies are justified by the future huge margins from "AGI". Without that the the price crash.

Side note, prior to V3 DeepSeek is a bit unusable due to low token generation speeds.

[+] msp26|1 year ago|reply

Western LLM censorship affects me far more than Chinese LLM censorship.

[+] bufferoverflow|1 year ago|reply

DeepSeek was trained for a fraction of the cost compared to OpenAI/Anthropic models. If they were given comparable resources, I imagine their model would outperform everything on the market by a wide margin.

[+] culi|1 year ago|reply

> there are a few structural and fundamental reasons why deepseek can never outperform other models by a wide margin

Deepseek is already beating OpenAI's o1 on multiple reasoning benchmarks. I would call their MATH result a "wide margin"

https://api-docs.deepseek.com/news/news1120

[+] n144q|1 year ago|reply

"to some degree"

If you are a history researcher or a political analyst, maybe. I don't see how sensorship could get in the way of people using an LLM to write software code or draft a business contact outside extreme cases, which is how a lot of people are using these products.

[+] caycep|1 year ago|reply

As a usage question - what do you use gemini/chatgpt/deepseek/claudie for? Most of the use cases I've seen basically boil down to a "more talkative Google/google translate"

[+] Onavo|1 year ago|reply

> China censorship which limits the deepseek data ingestion and output, to some degree.

We just call it alignment research instead. Same pig, different shade of lipstick.

[+] nimbius|1 year ago|reply

1. China already has a domestic 3nm process and competitive video card industry that openly and actively seeks independence from sanction. Huawei is evidence that sanctions are not as effective as foreign policy leaders may think.

2. Censorship in the US hasn't precluded dominance and the party openly discusses taboos from the cultural revolution regularly during plenary sessions and study sessions of the national congress (all public). Output censorship isn't the same as input.

3. Redhats llm and ai efforts are all open source as well. Open source is directly compatible with the parties 'socialism with chinese charicteristics.'

[+] manquer|1 year ago|reply

I don't see real justification for a ban in the first place.

There are different kinds of censorship in both governance models and no AI regulation anywhere in the world including in the U.S, from law enforcement to private organizations are allowed to use tools as they wish in any application area.

Corporate censorship is real and quite heavy in US, starting from how copyright is enforced with flawed DMCA process , and custom automated systems with no penalties for abusers like with Youtube or section 230 or various censorship bills ostensibly to protect children etc

On top of that organizations will self censor in the fear of regulation(loose 230 immunity for example) or being dropped by partners who are oligopolies (VISA/MasterCard for example).

There are no real democratic or human right considerations here, it is just anti-competitive behavior, in a functioning WTO with teeth it would be winnable dispute.

For anyone thinking it it is unfair comparison or whataboutism or the censorship is not problematic, the amount of questions any of the major American models will not respond should tell you otherwise

[+] yellow_lead|1 year ago|reply

> Liang Wenfeng: We believe that as the economy develops, China should gradually become a contributor instead of freeriding. In the past 30+ years of the IT wave, we basically didn’t participate in real technological innovation. We’re used to Moore’s Law falling out of the sky, lying at home waiting 18 months for better hardware and software to emerge. That’s how the Scaling Law is being treated.

[+] mentalgear|1 year ago|reply

Impressive to think about how DeepSeek achieved: ~ Parity with o1 and Claude with > 10x less resources. Better algorithms and approaches are what's needed for the next step of ML.

[+] kjellsbells|1 year ago|reply

If you tell the world that eggs are awesome while denying other countries access to eggs, they discover ways to use less eggs and eventually realize they don't need eggs at all. Then you are stuck making Dennys breakfasts while the rest of the world is on to fine dining.

China has incredibly strong incentives to do the pure research needed to break the current GPU-or-else lock. I hope, for science' sake, we dont end up gunning down each others mathematicians on the streets of Vienna like certain nuclear physicists seem to go.

[+] qwertox|1 year ago|reply

It remains to be seen how stable a totalitarian government can be. China has the benefit of having full control over its people and therefore gets to decide what is important and what not, and currently people are ok with handing that control over to the government. But it's also a very fragile state, which can only be retained through full repression.

[+] djaouen|1 year ago|reply

> If you tell the world that eggs are awesome while denying other countries access to eggs, they discover ways to use less eggs

You are confusing cause with effect. What actually happened: Nixon opened up US trade with China and, ever since, China has been stealing trade secrets to undermine and overthrow American interests. Limiting their access to eggs was literally us trying to prevent them from stealing all our shit!

[+] wiradikusuma|1 year ago|reply

I hope the competition among AI companies will continue to be healthy. Meaning they will keep sharing their techniques and papers, and we, as a whole, will be better off.

[+] inSenCite|1 year ago|reply

"Before Deepseek, CEO Liang Wenfeng’s main venture was High-Flyer (幻方), a top 4 Chinese quantitative hedge fund last valued at $8 billion"

Seems wild that a top 4 quant hedge fund is only $8B?

[+] csomar|1 year ago|reply

I think that's the value of the fund not AUM. BlackRock has 11trillion of AUM but only 39bn of equity.

[+] sebmellen|1 year ago|reply

Chinese stocks are nowhere near American prices.

[+] fallmonkey|1 year ago|reply

Strangely, deepseek has been always a prominent name in open source LLM community since last year, with their repos and papers - https://github.com/deepseek-ai. Nothing of it is really quiet except that they probably burn 1% of marketing money compared to other China LLM players.

[+] emporas|1 year ago|reply

Not personally surprised that a MoE model performs so well.

I used Mixtral a lot for coding Rust, and it had qualities no other model had except GPT 3.5 and later Claude Sonet. The funny thing is Mixtral was based on Llama 2 which was not trained on code that much.

DeepSeek v3: 671B parameters on total, and 37B activated sounds very good even though impossible to run locally.

Question if some people happen to know: For each query it activates just that many of parameters, 37B, and no more?

[+] wolfgangK|1 year ago|reply

DeepSeek v3 can run on CPU & RAM :

https://www.reddit.com/r/LocalLLaMA/comments/1hqidbs/deepsee...

Epyc Gen4 and 12 memory channels of DDR5 @4800 should give you 7 to 9 t/s.

[+] coolspot|1 year ago|reply

It activates only 37B per query, but you don’t know which ones ahead of time, so you gotta store all 671B in (V)RAM.

[+] int_19h|1 year ago|reply

Mistral LMs are not LLaMA derivatives.

[+] dumbmrblah|1 year ago|reply

Part of the reason their API is so cheap because they explicitly state they are going to train on your API data. Open AI and Claude say they won’t if you use their API (if you use ChatGPT that’s a different story). There are no free lunches.

[+] eldenring|1 year ago|reply

This comment is misleading. There is a "free lunch" here in the sense that serving this model is far cheaper than worse, open source models at scale.

Yes they probably are more willing to go down in price due to this, but the architecture is open, and they are charging similarly to a 30B-50B dense model, which is about how many active params deepseek-v3 has.

[+] orbital-decay|1 year ago|reply

This reminds me of PixArt-α. It's a diffusion model for image generation, that demonstrated that it's possible to train a SotA model on a ridiculously tiny budget ($28k).

[+] murtio|1 year ago|reply

I'm starting to believe that these articles are commissioned. I asked their public model questions related to branding and marketing, instructing it to come up with a brand identity based off the apps functionalities. It kept talking to itself for more than 5 minutes in Chinese! Then finished up with a very bad answer!

[+] suraci|1 year ago|reply

I'm wondering what impact this will have on NVDA

[+] sroussey|1 year ago|reply

I’m surprised there is no word of combining old school symbolic AI with the new ML derived versions we enjoy today.

[+] blackoil|1 year ago|reply

If is funny how a site that otherwise stays away from politics turns into reddit as soon as China is mentioned.

[+] throwaway290|1 year ago|reply

Maybe because it is a country using technology to attack US. It caused deaths of US citizens. And this was going for 10+ years.

I have the opposite question, why that is not brought up every time China is mentioned.

https://www.naccho.org/blog/articles/cyber-attack-on-u-s-hos...

https://www.theregister.com/2024/12/30/att_verizon_confirm_s...

https://www.politico.com/news/2022/12/28/cyberattacks-u-s-ho... Yes these days more of it is Russia and DPRK (the peace loving prosperous country according to ByteDance's AI) but hmm let's see where they would get the tech from if they are banned from it otherwise

[+] tw1984|1 year ago|reply

it is good news for all software devs and AI researchers, we are taking the fruits of AI back from silicon mongers!

[+] anshumankmr|1 year ago|reply

sadly not much from India on this front save for maybe Sarvam AI

[+] timtom123|1 year ago|reply

So much spam around this model. LocalLLaMA is stuffed with spam posts and even hacker news is getting spammed. Who has actually ran this model and verified performance? Does anyone know of a decent review from a trustworthy source?