top | item 42852242

(no title)

ogrisel | 1 year ago

I don't understand why it's bad for Nvidia either.

The fact that DeepSeek-R1 is so much better than DeepSeek-V3 at various important tasks means that Chain-of-though / thinking-before-answering models are better. But they are also more compute intensive at inference time than their instruction non-thinking counterparts.

So even if the DeepSeek-V3 pretraining + GRPO COT post-training procedure was cheaper than anticipated to reach o1 grade performance, inference is still costly, even if you use a distilled model.

discuss

bildung|1 year ago

Deepseek offers API pricing directly on their website, so it's pretty easy to compare inference costs indirectly: It's $60.00 vs. $2.19 for 1M output tokens. Openai is 27x as expensive.