I think we should have substantially more confidence in the claims of people who A) haven’t been caught misleading us yet and B) have published extensive code and weights for their absolutely cutting edge stuff and C) aren’t attached to a bunch of other bad behavior (e.g. DDoS crawlers) that we know about.
If there’s news of DeepSeek behaving badly and I missed it, then I take that back, but AFAIK they are at or near the top of the rankings on being good actors.
Why is this doubtful, did you spot any suspicious things in their paper? They make the weights and a lot of training details open as well, which leaves much less room for making stuff up, e.g. you could check training compute requirements from active weight size (which they can't fake as they released the weights) and fp8 training used.
There is rumor the open source is diff from the hosted deepseek so needs more investigation. A bad actor would be someone piping oai models behind a server
It's not actually a 600B+ model. It's a mixture of experts. The actual models are pretty small and thus don't require as much training to reach a decent point.
It's similar to Mixtral having gotten good performance while not having anywhere near OpenAI class money / compute.
benreesman|1 year ago
If there’s news of DeepSeek behaving badly and I missed it, then I take that back, but AFAIK they are at or near the top of the rankings on being good actors.
lopuhin|1 year ago
m3kw9|1 year ago
ioulaum|1 year ago
It's similar to Mixtral having gotten good performance while not having anywhere near OpenAI class money / compute.
ur-whale|1 year ago
Is this described in the paper or was this inferred from the model itself ?
Just curious, especially if the latter.
unknown|1 year ago
[deleted]
rbcjvuvy6|1 year ago
[deleted]