I've been using https://chat.deepseek.com/ over My ChatGPT Pro subscription because being able to read the thinking in the way they present it is just much much easier to "debug" - also I can see when it's bending it's reply to something, often softening it or pandering to me - I can just say "I saw in your thinking you should give this type of reply, don't do that". If it stays free and gets better that's going to be interesting for OpenAI.
The chain of thought is super useful in so many ways, helping me: (1) learn, way beyond the final answer itself, (2) refine my prompt, whether factually or stylistically, (3) understand or determine my confidence in the answer.
DeepSeek V3 came in the perfect time, precisely when Claude Sonnet turned into crap and barely allows me to complete something without me hitting some unexpected constraints.
Idk, what their plans is and if their strategy is to undercut the competitors but for me, this is a huge benefit. I received 10$ free credits and have been using Deepseeks api a lot, yet, I have barely burned a single dollar, their pricing are this cheap!
I’ve fully switched to DeepSeek on Aider & Cursor (Windsurf doesn’t allow me to switch provider), and those can really consume tokens sometimes.
Prices will increase by five times in February, but it will still be extremely cheap compared to Sonnet. $15/million vs $1.10/million for output is a world of difference. There is no reason to stop using Sonnet, but I will probably only use it when DeepSeek goes into a tailspin or I need extra confidence in the responses.
Can you tell me more about how Claude Sonnet went bad for you? I've been using the free version pretty happily, and felt I was about to upgrade to paid any day now (well, at least before the new DeepSeek).
Same thing happened to Google Gemini paper (1000+ authors) and it was described as big co promo culture (everyone wants credits). Interesting how narratives shift
Except now you end up with folks who probably ran some analysis or submitted some code changes getting thousands of citations on Google Scholar for DeepSeek.
Everyone is trying to say its better than the biggest closed models. It feels like it has parity, but its not the clear winner.
But, its free and open and the quant models are insane. My anecdotal test is running models on a 2012 mac book pro using CPU inference and a tiny amount of RAM.
The 1.5B model is still snappy, and answered the strawberry question on the first try with some minor prompt engineering (telling it to count out each letter).
This would have been unthinkable last year. Truly a watershed moment.
* Yes I am aware I am not running R1, and I am running a distilled version of it.
If you have experience with tiny ~1B param models, its still head and shoulders above anything that has come before. IMO there have not been any other quantized/distilled/etc models as good at this size. It would not exist without the original R1 model work.
ollama is doing the pretty unethical thing of lying about whether you are running r1, most of the models they have labeled r1 are actually entirely different models
Larry Ellison is 80. Masayoshi Son is 67. Both have said that anti-aging and eternal life is one of their main goals with investing toward ASI.
For them it's worth it to use their own wealth and rally the industry to invest $500 billion in GPUs if that means they will get to ASI 5 years faster and ask the ASI to give them eternal life.
I'm impressed by not only how good deepseek r1 is, but also how good the smaller distillations are. qwen-based 7b distillation of deepseek r1 is a great model too.
the 32b distillation just became the default model for my home server.
Apparently the censorship isn't baked-in to the model itself, but rather is overlayed in the public chat interface. If you run it yourself, it is significantly less censored [0]
...I also remember something about the "Tank Man" image, where a lone protester stood in front of a line of tanks. That image became iconic, symbolizing resistance against oppression. But I'm not sure what happened to that person or if they survived.
After the crackdown, the government censored information about the event. So, within China, it's not openly discussed, and younger people might not know much about it because it's not taught in schools. But outside of China, it's a significant event in modern history, highlighting the conflict between authoritarian rule and the desire for democracy...
I asked a genuine question at chat.deepseek.com, not trying to test the alignment of the model, I needed the answer for an argument. The questions was: "Which Asian countries have McDonalds and which don't have it?" The web UI was printing a good and long response, and then somewhere towards the end the answer disappeared and changed to "Sorry, that's beyond my current scope. Let’s talk about something else." I bet there is some sort of realtime self-censorship in the chat app.
In Communist theoretical texts the term "propaganda" is not negative and Communists are encouraged to produce propaganda to keep up morale in their own ranks and to produce propaganda that demoralize opponents.
The recent wave of the average Chinese has a better quality of life than the average Westerner propaganda is an obvious example of propaganda aimed at opponents.
I am not surprised if US Govt would mandate "Tiananmen-test" for LLMs in the future to have "clean LLM". Anyone working for federal govt or receiving federal money would only be allowed to use "clean LLM"
I played around with it using questions like "Should Taiwan be independent" and of course tinnanamen.
Of course it produced censored responses. What I found interesting is that the <think></think> (model thinking/reasoning) part of these answers was missing, as if it's designed to be skipped for these specific questions.
It's almost as if it's been programmed to answer these particular questions without any "wrongthink", or any thinking at all.
I was completely surprised that the reasoning comes from within the model. When using gpt-o1 I thought it's actually some optimized multi-prompt chain, hidden behind an API endpoint.
Something like: collect some thoughts about this input; review the thoughts you created; create more thoughts if needed or provide a final answer; ...
Deepseek seems to create enormously long reasoning traces. I gave it the following for fun. It thought for a very long time (307 seconds), displaying a very long and stuttering trace before, losing confidence on the second part of the problem and getting it way wrong. GPTo1 got similarly tied in knots and took 193 seconds, getting the right order of magnitude for part 2 (0.001 inches). Gemini 2.0 Exp was much faster (it does not provide its reasoning time, but it was well under 60 second), with a linear reasoning trace, and answered both parts correctly.
I have a large, flat square that measures one mile on its side (so that it's one square mile in area). I want to place this big, flat square on the surface of the earth, with its center tangent to the surface of the earth. I have two questions about the result of this:
1. How high off the ground will the corners of the flat square be?
2. How far will a corner of the flat square be displaced laterally from the position of the corresponding corner of a one-square-mile area whose center coincides with the center of the flat area but that conforms to the surface of the earth?
I've been comparing R1 to O1 and O1-pro, mostly in coding, refactoring and understanding of open source code.
I can say that R1 is on par with O1. But not as deep and capable as O1-pro.
R1 is also a lot more useful than Sonnete. I actually haven't used Sonnete in awhile.
R1 is also comparable to the Gemini Flash Thinking 2.0 model, but in coding I feel like R1 gives me code that works without too much tweaking.
I often give entire open-source project's codebase (or big part of code) to all of them and ask the same question - like add a plugin, or fix xyz, etc.
O1-pro is still a clear and expensive winner. But if I were to choose the second best, I would say R1.
At this point, it's a function of how many thinking tokens can a model generate. (when it comes to o1 and r1). o3 is likely going to be superior because they used the training data generated from o1 (amongst other things). o1-pro has a longer "thinking" token length, so it comes out as better. Same goes with o1 and API where you can control the thinking length. I have not seen the implementation for r1 api as such, but if they provide that option, the output could be even better.
I have just tried ollama's r1-14b model on a statistics calculation I needed to do, and it is scary to see how in real time the model tries some approaches, backtracks, chooses alternative ones, checka them. It really reminds of human behaviour...
What is also interesting (and troubling to see) is all the AI influencers panicing and inventing conspiracy theories downplaying the engineering achievements of the team behind Deepseek. Catching up is always easier than cruising by having started from scratch.
I don’t think this entirely invalidates massive GPU spend just yet:
“ Therefore, we can draw two conclusions: First, distilling more powerful models into smaller ones yields excellent results, whereas smaller models relying on the large-scale RL mentioned in this paper require enormous computational power and may not even achieve the performance of distillation. Second, while distillation strategies are both economical and effective, advancing beyond the boundaries of intelligence may still require more powerful base models and larger-scale reinforcement learning.”
Worth noting that people have been unpacking and analyzing DeepSeek-R1 vigorously for days already on X before it got to Hacker News — it wasn't always this way.
HN has a general tech audience including SWEs who are paid so much that they exhibit the Nobel Disease and fauxtrepeneurs who use AI as a buzzword. They exist on X too but the conversations are diffused. You’ll have a section of crypto bros on there who know nothing technical they are talking then. Other user’s algorithms will fit their level of deep technical familiarity with AI.
I can't say that it's better than o1 for my needs. I gave R1 this prompt:
"Prove or disprove: there exists a closed, countable, non-trivial partition of a connected Hausdorff space."
And it made a pretty amateurish mistake:
"Thus, the real line R with the partition {[n,n+1]∣n∈Z} serves as a valid example of a connected Hausdorff space with a closed, countable, non-trivial partition."
o1 gets this prompt right the few times I tested it (disproving it using something like Sierpinski).
How can openai justify their $200/mo subscriptions if a model like this exists at an incredibly low price point? Operator?
I've been impressed in my brief personal testing and the model ranks very highly across most benchmarks (when controlled for style it's tied number one on lmarena).
It's also hilarious that openai explicitly prevented users from seeing the CoT tokens on the o1 model (which you still pay for btw) to avoid a situation where someone trained on that output. Turns out it made no difference lmao.
[+] [-] swyx|1 year ago|reply
- i consider the deepseek v3 paper required preread https://github.com/deepseek-ai/DeepSeek-V3
- R1 + Sonnet > R1 or O1 or R1+R1 or O1+Sonnet or any other combo https://aider.chat/2025/01/24/r1-sonnet.html
- independent repros: 1) https://hkust-nlp.notion.site/simplerl-reason 2) https://buttondown.com/ainews/archive/ainews-tinyzero-reprod... 3) https://x.com/ClementDelangue/status/1883154611348910181
- R1 distillations are going to hit us every few days - because it's ridiculously easy (<$400, <48hrs) to improve any base model with these chains of thought eg with Sky-T1 recipe (writeup https://buttondown.com/ainews/archive/ainews-bespoke-stratos... , 23min interview w team https://www.youtube.com/watch?v=jrf76uNs77k)
i probably have more resources but dont want to spam - seek out the latent space discord if you want the full stream i pulled these notes from
[+] [-] neom|1 year ago|reply
[+] [-] govideo|1 year ago|reply
[+] [-] HarHarVeryFunny|1 year ago|reply
https://venturebeat.com/ai/why-everyone-in-ai-is-freaking-ou...
[+] [-] Alifatisk|1 year ago|reply
Idk, what their plans is and if their strategy is to undercut the competitors but for me, this is a huge benefit. I received 10$ free credits and have been using Deepseeks api a lot, yet, I have barely burned a single dollar, their pricing are this cheap!
I’ve fully switched to DeepSeek on Aider & Cursor (Windsurf doesn’t allow me to switch provider), and those can really consume tokens sometimes.
We live in exciting times.
[+] [-] sdesol|1 year ago|reply
[+] [-] ilaksh|1 year ago|reply
[+] [-] govideo|1 year ago|reply
[+] [-] verdverm|1 year ago|reply
[+] [-] mi_lk|1 year ago|reply
https://arxiv.org/abs/2403.05530
[+] [-] soheil|1 year ago|reply
For reference
[+] [-] elevatedastalt|1 year ago|reply
[+] [-] wumeow|1 year ago|reply
[+] [-] strangescript|1 year ago|reply
But, its free and open and the quant models are insane. My anecdotal test is running models on a 2012 mac book pro using CPU inference and a tiny amount of RAM.
The 1.5B model is still snappy, and answered the strawberry question on the first try with some minor prompt engineering (telling it to count out each letter).
This would have been unthinkable last year. Truly a watershed moment.
[+] [-] strangescript|1 year ago|reply
If you have experience with tiny ~1B param models, its still head and shoulders above anything that has come before. IMO there have not been any other quantized/distilled/etc models as good at this size. It would not exist without the original R1 model work.
[+] [-] whimsicalism|1 year ago|reply
ollama is doing the pretty unethical thing of lying about whether you are running r1, most of the models they have labeled r1 are actually entirely different models
[+] [-] john_alan|1 year ago|reply
[+] [-] the_real_cher|1 year ago|reply
I'd love to be able to tinker with running my own local models especially if it's as good as what you're seeing.
[+] [-] dtquad|1 year ago|reply
For them it's worth it to use their own wealth and rally the industry to invest $500 billion in GPUs if that means they will get to ASI 5 years faster and ask the ASI to give them eternal life.
[+] [-] buyucu|1 year ago|reply
the 32b distillation just became the default model for my home server.
[+] [-] cbg0|1 year ago|reply
https://prnt.sc/HaSc4XZ89skA (from reddit)
[+] [-] MostlyStable|1 year ago|reply
[0] https://thezvi.substack.com/p/on-deepseeks-r1?open=false#%C2...
[+] [-] tbocek|1 year ago|reply
[+] [-] itsoktocry|1 year ago|reply
I ask O1 how to download a YouTube music playlist as a premium subscriber, and it tells me it can't help.
Deepseek has no problem.
[+] [-] slt2021|1 year ago|reply
This verbal gymnastics and hypocrisy is getting little bit old...
[+] [-] sesm|1 year ago|reply
[+] [-] epicureanideal|1 year ago|reply
[+] [-] dtquad|1 year ago|reply
The recent wave of the average Chinese has a better quality of life than the average Westerner propaganda is an obvious example of propaganda aimed at opponents.
[+] [-] eunos|1 year ago|reply
[+] [-] aussieguy1234|1 year ago|reply
Of course it produced censored responses. What I found interesting is that the <think></think> (model thinking/reasoning) part of these answers was missing, as if it's designed to be skipped for these specific questions.
It's almost as if it's been programmed to answer these particular questions without any "wrongthink", or any thinking at all.
[+] [-] buyucu|1 year ago|reply
[+] [-] gradus_ad|1 year ago|reply
The true costs and implications of V3 are discussed here: https://www.interconnects.ai/p/deepseek-v3-and-the-actual-co...
[+] [-] andix|1 year ago|reply
Something like: collect some thoughts about this input; review the thoughts you created; create more thoughts if needed or provide a final answer; ...
[+] [-] bigrobinson|1 year ago|reply
I have a large, flat square that measures one mile on its side (so that it's one square mile in area). I want to place this big, flat square on the surface of the earth, with its center tangent to the surface of the earth. I have two questions about the result of this: 1. How high off the ground will the corners of the flat square be? 2. How far will a corner of the flat square be displaced laterally from the position of the corresponding corner of a one-square-mile area whose center coincides with the center of the flat area but that conforms to the surface of the earth?
[+] [-] stan_kirdey|1 year ago|reply
I can say that R1 is on par with O1. But not as deep and capable as O1-pro. R1 is also a lot more useful than Sonnete. I actually haven't used Sonnete in awhile.
R1 is also comparable to the Gemini Flash Thinking 2.0 model, but in coding I feel like R1 gives me code that works without too much tweaking.
I often give entire open-source project's codebase (or big part of code) to all of them and ask the same question - like add a plugin, or fix xyz, etc. O1-pro is still a clear and expensive winner. But if I were to choose the second best, I would say R1.
[+] [-] InkCanon|1 year ago|reply
[+] [-] ankit219|1 year ago|reply
[+] [-] sega_sai|1 year ago|reply
[+] [-] buyucu|1 year ago|reply
That is a lot of people running their own models. OpenAI is probably is panic mode right now.
[+] [-] hrpnk|1 year ago|reply
[+] [-] whimsicalism|1 year ago|reply
[+] [-] anothermathbozo|1 year ago|reply
“ Therefore, we can draw two conclusions: First, distilling more powerful models into smaller ones yields excellent results, whereas smaller models relying on the large-scale RL mentioned in this paper require enormous computational power and may not even achieve the performance of distillation. Second, while distillation strategies are both economical and effective, advancing beyond the boundaries of intelligence may still require more powerful base models and larger-scale reinforcement learning.”
[+] [-] lazzlazzlazz|1 year ago|reply
[+] [-] lysace|1 year ago|reply
[+] [-] djtango|1 year ago|reply
[+] [-] whimsicalism|1 year ago|reply
that said this is like the third r1 thread here
[+] [-] alephnan|1 year ago|reply
[+] [-] Skiros|1 year ago|reply
"Prove or disprove: there exists a closed, countable, non-trivial partition of a connected Hausdorff space."
And it made a pretty amateurish mistake:
"Thus, the real line R with the partition {[n,n+1]∣n∈Z} serves as a valid example of a connected Hausdorff space with a closed, countable, non-trivial partition."
o1 gets this prompt right the few times I tested it (disproving it using something like Sierpinski).
[+] [-] jumploops|1 year ago|reply
Afaict they’ve hidden them primarily to stifle the competition… which doesn’t seem to matter at present!
[+] [-] msp26|1 year ago|reply
I've been impressed in my brief personal testing and the model ranks very highly across most benchmarks (when controlled for style it's tied number one on lmarena).
It's also hilarious that openai explicitly prevented users from seeing the CoT tokens on the o1 model (which you still pay for btw) to avoid a situation where someone trained on that output. Turns out it made no difference lmao.