top | item 42803750

(no title)

Nyr | 1 year ago

This article is assuming that they are being truthful and indeed had access to limited hardware resources, which is doubtful to say the least.

discuss

order

benreesman|1 year ago

I think we should have substantially more confidence in the claims of people who A) haven’t been caught misleading us yet and B) have published extensive code and weights for their absolutely cutting edge stuff and C) aren’t attached to a bunch of other bad behavior (e.g. DDoS crawlers) that we know about.

If there’s news of DeepSeek behaving badly and I missed it, then I take that back, but AFAIK they are at or near the top of the rankings on being good actors.

lopuhin|1 year ago

Why is this doubtful, did you spot any suspicious things in their paper? They make the weights and a lot of training details open as well, which leaves much less room for making stuff up, e.g. you could check training compute requirements from active weight size (which they can't fake as they released the weights) and fp8 training used.

m3kw9|1 year ago

There is rumor the open source is diff from the hosted deepseek so needs more investigation. A bad actor would be someone piping oai models behind a server

ioulaum|1 year ago

It's not actually a 600B+ model. It's a mixture of experts. The actual models are pretty small and thus don't require as much training to reach a decent point.

It's similar to Mixtral having gotten good performance while not having anywhere near OpenAI class money / compute.

ur-whale|1 year ago

> It's not actually a 600B+ model. It's a mixture of experts.

Is this described in the paper or was this inferred from the model itself ?

Just curious, especially if the latter.