On DeepSeek and export controls

rcarmo|1 year ago

I like a lot of what Dario writes, but in this case I just can't follow the reasoning. Everything I've picked up about how DeepSeek did what they did (including going a level lower than CUDA to better take advantage of the limited hardware[1], and the balance of techniques used[2]) points to some very smart Chinese engineers having out-smarted US ones (to put it in terms that matter to US folk, because I'm European and I ordinarily wouldn't care):

    1 - https://stratechery.com/2025/deepseek-faq/
    2 - https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture

This post comes across as overly defensive of the US export controls and leaning on the authoritarian regime angle far too much to feel like it isn't just a way to shore up interest in US-based AI companies and widen the moat (or just make sure someone else shores it up politically while they catch up technically).

Anyway, this will always be a deep pocket race. But I wish it wasn't so much about brute-forcing GPUs and wasting power to (as yet) uncertain outcomes as far as model capabilities are concerned, and to me what DeepSeek achieved was to point out ingenuity and better techniques should be something that both OpenAI and Anthropic ought to be pursuing instead of burning cash.

kristjansson|1 year ago

To be sure, DeepSeek did great work, and this is a bit aside from TFA. But the PTX thing is a bit of meme? What do we think torch.compile and triton and llvm's nvptx backend are doing under the hood? The warp-specialization thing quoted in [1] cites to a _2014_ paper[2] out of Stanford ...

[2]: https://dl.acm.org/doi/10.1145/2555243.2555258

highfrequency|1 year ago

He is basically saying, with his inside knowledge of Anthropic's current capabilities: "they did for $5m what we could probably do for $10m or $15m if we launched the training run today without any new optimizations." So on the one hand that's very impressive, both because the cost effectiveness is significantly higher and because even replicating SoTA outside of OpenAI/Anthropic is very difficult. On the other hand, it's not too surprising that a company that needs to economize on compute will find ways to do so; neither Anthropic nor OpenAI would consider it worthwhile to have their best researchers prioritize cutting down on training costs or compute requirements; they have near infinite capital and are focused on breakthroughs in making their best models as good as possible. I don't think it's accurate to say that Deepseek "outsmarted US engineers"; they had a very different objective function than Deepseek, so they pushed much harder on the engineering optimizations for better cost performance.

Everyone seems to rag on OpenAI/Anthropic for spending so much money and take it as a symptom of capitalist waste, but this reality seems great to me - massive amounts of money is basically being funneled from VCs toward progress in machine learning. Once expensive breakthroughs are made, it is only a matter of a few years until people make the engineering optimizations to make those breakthroughs cheap.

Just want to emphasize the progress in cost that Dario highlights: fixed AI capabilities are becoming 4x cheaper every year. That is absolutely insane. US GDP growth averaged 3-4% over the last 250 years and look how far that has taken us. Moore's Law averaged ~40% annual growth in transistor density and look how far that has taken us in just 65 years. 4x growth is AI capability/cost per year is absolutely insane.

Kostchei|1 year ago

yeh. I see Dario saying "let's protect the US more" for no reason other than bias and "of course they improved over time" which feels like a mighty strong strain of copium. Very disappointing for a leader of an organization i respected. Assuming he speaks for Anthropic, and it seems he does, Past tense.

snake_doc|1 year ago

Without taking a position on unipolar vs. multi-polar:

Dario makes an astounding implicit assumption:

- China originating labs cannot acquire chips providing 80-90% similar utility without the US within the next 2-3 years.

I'll make an observation, re: DeepSeek's incentives that drove them to create the innovations from the V2 and V3 papers.

DeepSeek, compared to American AI labs, are much more compute constrained, but in a unique way. Their chips are more memory bandwidth constrained (depending on type anywhere from 50% to 80% less bandwidth).

Therefore, each dollar/hour of investment towards memory optimization is worth MORE to DeepSeek than to American labs.

In the V2/3 paper, they've demonstrated exactly that with these memory optimization techniques.

1. MLA -> reduces KV cache by nearly 80% compared to GQA. By the way, this was published in V2 in May 2024.

2. FP8 matmul (while still accumlating in FP32 gradients) without losing significant quality.

3. DualPipe scheduling and reworking of Hopper SM's allocation on communication vs. computation -> DeepSeek's V3 paper has 2 full pages of hardware suggestions for "hardware designers" (read NVIDIA)

Export controls in a global market create different incentives in parties. The resulting incentives will change, and agents (using it as an traditional economics term) will change their capital allocation strategy.

Palmik|1 year ago

> All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese.

Says the CEO whose product [1] costs 15-50x times more. (This is not just the DeepSeek's API, but also 3p providers hosting the same model)

> DeepSeek does not "do for $6M5 what cost US AI companies billions". I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train

Ok, that's still at least 3-10x cost reduction (assuming "a few $10M" lowerbound of $20M). And for a model that he later implies is 2x larger than Sonnet. So that's 6-10x efficiency improvement. Nice!

> Since DeepSeek-V3 is worse than those US frontier models — let’s say by ~2x on the scaling curve.

What curve? Does he mean the simplistic performance / model params curve? That does not take into account that DeepSeek v3 is a MoE (can't compare MoE and dense param # in a naive way), nor the other architecture changes (KV compression, etc.).

Also, if Sonnet 3.5 is 2x smaller, then why is inference 15-50x more expensive than DeepSeek v3's? Does Anthropic not have good GPU engineers? Are they just running at insanely high margins? As a consumer I don't care how big your model is behind the scenes. I care about API costs or inference efficiency when hosting the model myself.

[1] Product that is mostly comparable and in some ways quite ahead.

kalkin|1 year ago

Where does he imply that it's 2x larger than Sonnet?

suraci|1 year ago

> Thus, in this world, the US and its allies might take a commanding and long-lasting lead on the global stage.

This is just like Captain America giving a pre-battle speech to the Avengers—it's so inspiring! Hail Hydra!

_t9ow|1 year ago

> the US and its allies

What allies? Is he aware that his current President is alienating and antagonising almost every single country out there that can historically be considered an ally?

rcarmo|1 year ago

I just had to upvote this for the tongue in cheek call to arms, even though I am really sad that the US is turning out the way it is today...

Synaesthesia|1 year ago

Democratic USA must prevail against Communist China by having better models.

edit: (Please read this in a sarcastic voice, I think it's a crazy idea!)

epoch_100|1 year ago

> DeepSeek does not "do for $6M what cost US AI companies billions". I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train (I won't give an exact number). Also, 3.5 Sonnet was not trained in any way that involved a larger or more expensive model (contrary to some rumors).

Wow!

jddj|1 year ago

The narrative is running away in every direction.

There are popular infographics floating around today which explain to the layman that deepseek invented MoE.

gadtfly|1 year ago

^ This is publicly new information, and the 2nd part especially contradicts consequential rumours that had been all-but-cemented in closely-following outsiders' understanding of Sonnet and Anthropic. Completely aside from anything else in this article.

Palmik|1 year ago

Anthropic is, according to themselves, using RLAIF... which is basically using LLM as a judge / reward model. So maybe he means that the models they use for RLAIF are not (much?) more expensive than Sonnet 3.5 (e.g. previous Sonnet or Haiku 3 :)).

Leary|1 year ago

Dario certainly had no qualms about working with authoritarian regimes when he worked for Baidu!

Also, if the US seeks a permanent advantage over China by 2027 with self-replicating AI, so much so that it can dictate whatever terms militarily. Wouldn't that just force China to send a few cruise missiles to TSMC before that happen?

dtquad|1 year ago

>Dario certainly had no qualms about working with authoritarian regimes when he worked for Baidu

Maybe it's worth listening to a Western AI researcher who worked for a Chinese company and came to the conclusion "China must not beat us to AGI/ASI".

beardyw|1 year ago

> force China to send a few cruise missiles to TSMC before that happen?

Why on earth would they do that? China considers Taiwan to be rightfully theirs and would consider TMSC to be a valuable asset.

daft_pink|1 year ago

I just want to ask if when we talk about export controls on AI chips, is this going to create export controls on general consumer goods in the near future.

Given Moore’s law and the efficiency that will clearly come from optimizing chips for AI and competition increasing the amount of VRAM on these devices to run models locally.

Is creating an export regime today going to mean that in 3 or 4 years general smart phones and high end laptops are all going to be subject to export controls?

Keep in mind that at one time computers that consumed entire building are less powerful than my apple watch.

itishappy|1 year ago

I don't expect so. Export controls are aimed at chips needed for training, not inference.

The H20 with 96GB of memory is currently available in China. We're a ways off from restricting consumer devices.

wongarsu|1 year ago

That would be a great way to give Non-American companies an advantage in the market. Imagine if Apple has to constantly worry about complying with US export controls while Samsung, a Korean company, can just ship whatever they want with zero additional paperwork as long as they avoid US suppliers. Multiply that over all consumer goods that might include AI.

pavl-|1 year ago

Unless the author makes a compelling case about why AI breaks the MAD status-quo between nuclear powers, I will assume that their appeals to NATSEC are an attempt to artificially create moat for their company.

highfrequency|1 year ago

I also find it counterintuitively reassuring that we have already had the power to blow up the planet for decades. But at least in theory - GPT10 level AI could help develop missile defense systems that actually work at scale, which would eliminate the MAD status quo, no?

flashman|1 year ago

I imagine that argument is a pretty easy sell. Of course governments will want to create moats that protect their incumbents.

superq|1 year ago

AI is potentially so much deeper than MAD status quo.

world2vec|1 year ago

At least they're all coming out of the woodwork and start telling more details about their own training runs, costs, efficiency rates and so on. Interesting to see how an open-weights model could force their hand like that.

highfrequency|1 year ago

The technical summary is excellent. But I worry that prominent US voices couching AI progress in Cold War style war rhetoric is likely to be... self-fulfilling. Surely some of the people paying closest attention to these articles are Chinese politicians and military leaders - it would definitely seem alarming to me if the Deepseek CEO kept writing things like: "whoever reaches this threshold of AI intelligence first will accelerate into world military domination - we must beat the US."

There's a reason why Oppenheimer & co. weren't non-stop publishing internationally accessible op-eds during WWII about how important it is that Germany not develop a scaled uranium fission bomb before the US.

rcarmo|1 year ago

If you read enough history books (and have a passable grasp of German, and tolerate the weird lead types used in the press for those days) you will be able to find _a lot_ of opinion pieces from our recent history that play along the same lines, but for earlier industries (like automotive).

People _really_ ought to study more of our past.

jonathanstrange|1 year ago

It's shortsighted to believe that export controls are relevant. China will be able to manufacture as good or better chips than the US in the not too distant future anyway.

suraci|1 year ago

This is a choice between the lesser of two evils, or what could be called "drinking poison to quench thirst." Export controls might lead China to develop its own advanced chips, but without them, China will certainly gain powerful AI capabilities

astrange|1 year ago

That is impossible. There's several layers deep of sole suppliers for them.

wood_spirit|1 year ago

China only has to attack Taiwan to deny the US of TSMC’s production capacity.

Prediction: they will do this as soon as they see sufficient domestic chip production.

The attempts to inshore fabs to the USA is too little too late.

And it doesn’t matter if Trump defends Taiwan and makes China back down - the fabs will have been bombed to bits.

As it is, trump would probably just do sanctions and tariffs, which - like Russia - China will expect to weather.

simplyluke|1 year ago

From my perspective it’s very convenient that all the new information and competition supports his existing priors that:

1) we need to do various forms of regulation to entrench US closed source market leaders, which happens to increase his company’s value

2) the best way towards improvement in models is not efficiency but continuing to burn ever increasing piles of money, which happens to increase his company’s value

scilro|1 year ago

>Even if the US and China were at parity in AI systems, it seems likely that China could direct more talent, capital, and focus to military applications of the technology. Combined with its large industrial base and military-strategic advantages, this could help China take a commanding lead on the global stage, not just for AI but for everything.

China spends 1.5% of its GDP on its military. The US spends 3.5%. I get that the two countries are engaged in competition for dominance, but why is China the bigger threat here?

Also, there are a great deal of groups and institutions out there that are pushing for more diplomacy, more cooperation, and a ratcheting down of tensions. If Dario is going to get political anyway, why not go that route?

yucatansunshine|1 year ago

It's also funny that there's a scary implication "China could do XYZ and focus on military applications" when Anthropic and Palantir teamed up last year to offer Claude models to US Defense orgs

https://investors.palantir.com/news-details/2024/Anthropic-a...

wood_spirit|1 year ago

> Export controls serve a vital purpose: keeping democratic nations at the forefront of AI development.

As an outsider looking in, I’m not thinking of the USA as a particularly democratic country any more. And I didn’t think its export controls were about democracy, but rather a trade weapon.

wood_spirit|1 year ago

Stepping back, it feels like the techies working on ai have been celebrating the expense of it all. That techies have been wanting bigger and bigger clusters as some kind of bragging rights and that suits the suits who see it as a most to protect their crazy stock validations. The money was being made not by the miners but by nvidia selling shovels they could name the price of.

Necessity is the mother of invention. Deepseek is demonstrating that if the incentives are aligned to want efficiency then good engineers - and there are plenty of them in every country - can make things fly.

codekilla|1 year ago

The author seems to be of the opinion that the creators of DeepSeek will either be unable to, or will not see the value of optimizing the 'second stage' RL component of the 'new' (post pre-training RL) way of training frontier foundation models. Every competent programmer in China is now looking for low level ptx optimizations for EVERY SINGLE STAGE of the pipeline. They will now, likely not publish any of it.

astrange|1 year ago

There isn't a "the pipeline". You'd have to work at DeepSeek for your low-level work to affect it.

flashman|1 year ago

"Making AI that is smarter than almost all humans at almost all things will require millions of chips, tens of billions of dollars (at least), and is most likely to happen in 2026-2027."

What an incredible claim to just slip in two-thirds of the way through! Where have I heard it before?

1965, H. A. Simon: "Machines will be capable, within twenty years, of doing any work a man can do."

curious_cat_163|1 year ago

> All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese.

Excellent analysis and I largely agree. The part where he looses me here is the certainty in the claim around fundamental changes to the economics...

This is open weights with a (largely) open recipe. Arguably, Meta has already been at it but now there are two labs producing frontier models at this level. A wishful analysis could imagine that there _might_ be more and that _could_ result in a qualitative shift in how the frontier model research is being done and hence affect the economics but Dario seems to have largely ignored this aspect.

Not sure why?

alexlur|1 year ago

Note: Author is the CEO of Anthropic.

AlanYx|1 year ago

This piece ignores recent news that DeepSeek is doing inference on Huawei’s Ascend 910C GPUs.

The simplified export control analysis Amodei gives here gets a lot more complicated when increasing export controls potentially ends up spurring additional R&D in Nvidia competitors.

rcarmo|1 year ago

There are lots of people working on ASICs for inference. There are already people using FPGAs for some vertically integrated AI stacks, so it's only a matter of time until we start discussing literally hard-coding models (or flashing them) to hardware to make them go faster.

horsawlarway|1 year ago

I find this article to be exceptionally self-serving.

zerotolerance|1 year ago

AI is a race to the bottom, and a race to over-spend. Seeing this as competition will bankrupt the players, make enablers rich, and end up killing a lot of people when the need to bolster confidence in its capabilities overrules basic common sense.

zb3|1 year ago

> In the end, AI companies in the US and other democracies must have better models than those in China if we want to prevail.

How about everyone has access to the best model because it's Open Source and openly developed?

unknown|1 year ago

[deleted]

orbital-decay|1 year ago

That post is really dishonest, which is expected from Anthropic's CEO talking about the competitors. Dario conveniently omits that R1's approach was validated months before o1-preview came out, and DeepSeek releases the weights and architecture unlike Anthropic and OAI which produce black box models in the name of "safety". Then he tries to diminish their innovation, dismissing it as a normal step they're also capable of, saying v3 and R1 are competing with models that are 9-10 months old. Which is not really true.

And then there goes the usual techno-feudal interjection:

>If China can't get millions of chips, we'll (at least temporarily) live in a unipolar world, where only the US and its allies have these models. It's unclear whether the unipolar world will last, but there's at least the possibility that, but there's at least the possibility that, because AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage

So Dario envisions a unipolar world where a black box corporation (his own one, of course) controls a perpetually self-improving machine god and distributes scraps to the masses that are too unwashed and dangerous to control it themselves, while at the same time preventing other countries from breaking out and making their own self-improving machine god because they're evil. A boot stomping on your face forever, regardless of who you are.

That's coming from an "ethical, helpful, and honest" guy.

>To be clear, the goal here is not to deny China or any other authoritarian country the immense benefits in science, medicine, quality of life, etc. that come from very powerful AI systems. Everyone should be able to benefit from AI. The goal is to prevent them from gaining military dominance.

And of course he immediately tries to deny that thought.

buyucu|1 year ago

Anthropic CEO is just mad that there is a new player in the business. DeepSeek models are not just better, they are also open weight.

I'm sure Anthropic's valuation got hit badly by the DeepSeek saga, and the CEO is lashing out in panic.

mola|1 year ago

Maybe if US wasn't led by a bully that care's not about democratic ideals, but only on the rule of power it'll be easier to support the notion of US the leader of the free world. But just take a look at how he handled Denmark-Greenland issue, Pure bullying. In the Trumpian world, US has no allies, only vassals.

quantum_state|1 year ago

Someone might just be the Trojan horse … encouraging the current path from within …

resters|1 year ago

He’s lost my Claude pro subscription and my respect with this post

throwaway_32u10|1 year ago

I'm kind of tired of the entire narrative that "US good, everyone else bad", as if only US deserves to hold powerful technology because it will be used for "the greater good", rather than employ it in military applications, as if Anthropic didn't partner with Palantir.

And it amazes me how many people can't seem to see past the "US is good, everyone else are bad" smoke mirror.

dang|1 year ago

I'm sure most of us agree with you, but independently of that, please don't take HN threads on generic tangents, and certainly not generic nationalistic flamewar tangents. The problem is that it's tediously repetitive and inevitably turns nasty.

"Eschew flamebait. Avoid generic tangents."

parsimo2010|1 year ago

First- the author of this blog post is the CEO of Anthropic, an American AI company. Of course they are going to argue for export controls if it can hurt their competition. They get to benefit from anticompetitive practices with none of the legal risk! So it's not even about keeping the "bad guys" from having things, it's really about making more profit. Also, you can typically assume that anything a CEO publishes is in a pursuit to raise stock prices, it has nothing to do with morality. If CEOs were moral people most of them wouldn't have become CEOs (there are probably a few CEOs that are truly good people, but I'll bet there are more "wolf in sheep's clothing" CEOs that use an image of morality to improve their company's reputation).

Second- aside from the specifics of this post, you don't have to believe that the US is good to see the logic of export controls. There are a number of countries that are openly hostile to the US (Iran and North Korea), and also some that are semi-privately working against the US (like China). These countries will take any advantage that they can- they will steal IP and use the US's development resources to catch up to the current state of the art at a reduced cost. So you don't have to put this in terms of good vs. bad, just think of it in terms of things that benefit the US and things that don't benefit the US. Whether you think people are evil is irrelevant, it is very logical for the US to put export controls on things that it doesn't want to give away for free. The US wants to preserve its advantage and make it as costly and difficult for these other countries to catch up.

BizarreByte|1 year ago

[flagged]

k1m|1 year ago

Absolutely. Chinese companies shouldn't have chips because their government has "committed human rights violations, has behaved aggressively on the world stage". And the US government hasn't?

CamperBob2|1 year ago

He's unlikely to be writing from a genuinely ideological point of view. He's playing a zero-sum game, in that every chip that goes to China is one that Anthropic and/or its cloud provider doesn't get its hands on.

So it makes sense to him to argue for export controls using whatever rhetorical flexes and flourishes he can come up with.

dark_glass|1 year ago

I live here, so I want it to be the best.

Al-Khwarizmi|1 year ago

[deleted]

StefanBatory|1 year ago

I mean, in the end, USA is still better than the alternative.

baggiponte|1 year ago

he's coping so hard.

tuyguntn|1 year ago

I call this capitalism when its convenient for us, socialism when we are struggling.

I hope one day, China starts blocking advanced tech from US, for the good of the whole world

Kostchei|1 year ago

I hope they learn from the terrible job of stewardship of tech the west has done and don't repeat our mistakes.

game_the0ry|1 year ago

CEO of an AI company asks for regulation against a country that is out-competing him.

Are we still capitalists?

I think the answer is - no.

Verlyn139|1 year ago

another American Nationalist that see competition as a "threat" to their monopoly, how Great!!

Verlyn139|1 year ago

[deleted]

breakitmakeit|1 year ago

[deleted]

173 comments