top | item 44444913

(no title)

jjordan | 8 months ago

I don't know why everyone keeps echoing this, my experience with Deepseek-R1, from a coding perspective at least, has been underwhelming at best. Much better experience with GPT 4.1 (and even better with Claude, but that's a different price category).

discuss

am17an|8 months ago

I'm not arguing which model is better for your use-case. I'm saying in general as it's "powerful" as GPT 4.1 in a lot of benchmarks, and you can peak underneath the hood, even make it better for your said use-case

seunosewa|8 months ago

Do you mean V3? V3 is 4.1 level or above.

Zambyte|8 months ago

In my experience, all reasoning models feel (vibely) worse at structured output like code versus comparable non-reasoning models, but far better at knowledge-based answering.

hnfong|8 months ago

A lot of software (eg. ollama) has confusingly named Deepseek's distill/finetunes of other base models "DeepSeek-R1" as well. See eg. https://www.threads.com/@si.fong/post/DKSdUOHzaBB

I wonder whether you're actually running the proper DeepSeek-R1 model, or one of those lesser finetunes?

jorvi|8 months ago

This is everyone with every model.

People sang praise from the roof for Google's Gemini 2.5 models, but in many things for me they can't even beat Deepseek V3.

CamperBob2|8 months ago

What would be an example of 2.5 Pro failing against R1 (which is what you'd actually want to compare it to)?

iJohnDoe|8 months ago

I got the impression that 03-mini or 03-mini-high were meant for coding? GPT 4.1 was meant for creative writing, not coding?

conradev|8 months ago

It’s good at a lot of things:

  GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o and 26.6%abs over GPT‑4.5—making it a leading model for coding.

https://openai.com/index/gpt-4-1/