The "distilled+quantized versions" are not the same model at all, they are existing models (Llama and Qwen) finetuned on outputs from the actual R1 model, and are not really comparable to the real thing.
a distilled version running on another model architecture does not count as using "DeepSeek". It counts as running a Llama:7B model fine-tuned on DeepSeek.
Pretty sure this is just layman vs academic expert usage of the word conflicting.
For everyone who doesn’t build LLMs themselves, “running a Llama:7B model fined-tuned on DeepSeek.” _is_ using Deepseek mostly on account of all the tools and files being named DeepSeek and the tutorials that are aimed as casual users all are titled with equivalents of “How to use DeepSeek locally”
bakugo|1 year ago
raxxor|1 year ago
Sure, you could say that only running the 600+b model is running "the real thing"...
KolmogorovComp|1 year ago
HnUser12|1 year ago
lovich|1 year ago
For everyone who doesn’t build LLMs themselves, “running a Llama:7B model fined-tuned on DeepSeek.” _is_ using Deepseek mostly on account of all the tools and files being named DeepSeek and the tutorials that are aimed as casual users all are titled with equivalents of “How to use DeepSeek locally”
unknown|1 year ago
[deleted]