top | item 43844587

(no title)

ilrwbwrkhv | 10 months ago

Yup in my private evals I have repeatedly found that DeepSeek has the best models for everything and yet in a lot of these public ones it always seems like someone else is on the top. I don't know why.

discuss

__alexs|10 months ago

Publishing them might help you find out.

refulgentis|10 months ago

^ This.

If I had to hazard a guess, as a poor soul doomed to maintain several closed and open source models acting agentically, I think you are hyper focused on chat trivia use cases (DeepSeek has a very, very, hard time tool calling and they say as much themselves in their API docs)

unknown|10 months ago

[deleted]