May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
No sign of what source material it was trained on though right? So open weight rather than reproducible from source.
I remember there's a project "Open R1" that last I checked was working on gathering their own list of training material, looks active but not sure how far along they've gotten:
Benchmarks seem like a fools errand at this point; overly tuning models just to specific test already published tests, rather than focusing on making them generalize.
Hugging face has a leader board and it seems dominated by models that are finetunings of various common open source models, yet don't seem be broader used:
There's a table here showing some "Overall" and "Median" score, but no context on what exactly was tested. It appears to be in the ballpark as the latest models, but with some cost advantages with the downside of being just as slow as the original r1 (likely lots of thinking tokens). https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
Much more preferred to what OpenAI always did and Anthropic recently started doing. Just write some complicated narrative about how scary this new model is and how it tried to escape and deceive and hack the mainframe while telling the alignment operators bed time stories.
Out of sheer curiosity: What’s required for the average Joe to use this, even at a glacial pace, in terms of hardware? Or is it even possible without using smart person magic to append enchanted numbers and make it smaller for us masses?
You can run the 4bit quantized version of it on a M3 Ultra 512GB. That's quite expensive though. Another alternative is a fast CPU with 500GB of DDR5 RAM. That of course, is also not cheap and slower than the M3 Ultra. Or, you buy multiple Nvidia cards to reach ~500GB of VRam. That is probably the most expensive option but also the fastest
About 768 gigs of ddr5 RAM in a dual socket server board with 12 channel memory and an extra 16 gig or better GPU for prompt processing. It's a few grand just to run this thing at 8-10 tokens/s
Practically, smaller, quantized versions of R1 can be run on a pretty typically Macbook Pro setup. Quantized versions are definitely less performant, but they will absolutely run.
Truthfully, it's just not worth it. You either run these things so slowly that you're wasting your time or you have to buy 4- or 5-figures of hardware that's going to sit, mostly unused.
As mentioned you can run this on a server board with 768+ gb memory in cpu mode. Average joe is going to be running quantized 30b (not 600b+) models on an $300/$400/$900 8/12/16gb GPU
You can pay Amazon to do it for you at about a penny per 10 thousand tokens.
There's a couple of guides for setting it up "manually" on ec2 instances so you're not paying the Bedrock per-token-prices, here's [1] that states four g6e.48xlarge instances (192 vCPUs, 1536GB RAM, 8x L40S Tensor Core GPUs that come with 48 GB of memory per GPU)
Quick google tells me that g6e.48xlarge is something like 22k USD per month?
Sorry I'm being cheeky here, but realistically unless you want to shell out 10k for the equivalent of a Mac Studio with 512GB of RAM, you are best using other services or a small distilled model based on this one.
If speed is truly not an issue, you can run Deepseek on pretty much any PC with a large enough swap file, at a speed of about one token every 10 minutes assuming a plain old HDD.
Something more reasonable would be a used server CPU with as many memory channels as possible and DDR4 ram for less than $2000.
But before spending big, it might be a good idea to rent a server to get a feel for it.
What use cases are people using local LLMs for? Have you created any practical tools that actually increase your efficiency? I've been experimenting a bit but find it hard to get inspiration for useful applications
I have a signal tracer that evaluates unusual trading volumes. Given those signals, my local agent receives news items through API to make an assessment what happens. This helps me tremendously. If I would do this through a remote app, I'd have to spend a several dollars per day. So I have this on existing hardware.
I do a lot of data cleaning as part of my job, and I've found that small models could be very useful for that, particularly in the face of somewhat messy data.
You can for instance use them to extract some information such as postal codes from strings, or to translate and standardize country names written in various languages (e.g. Spanish, Italian and French to English), etc.
I'm sure people will have more advanced use cases, but I've found them useful for that.
Use the magic incantation -ot ".ffn_.*_exps.=CPU" to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk.
Not much to go off of here. I think the latest R1 release should be exciting. 685B parameters. No model card. Release notes? Changes? Context window? The original R1 has impressive output but really burns tokens to get there. Can't wait to learn more!
I think it’s cool to see this kind of international participation in fierce tech competition. It’s exciting. It’s what I think capitalism should be.
This whole “building moats” and buying competitors fascination in the US has gotten boring, obvious and dull. The world benefits when companies struggle to be the best.
there is a small community of people that do indeed run this locally. typically on CPU/RAM (lots and lots of RAM), insofar as that's cheaper than GPU(s).
That's about $16-24 per hour - depending on the number of tokens you're slinging in that period, it may be much cheaper than paying OpenAI for similar functionality.
jacob019|9 months ago
https://openrouter.ai/deepseek/deepseek-r1-0528/providers
May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model.
jazzyjackson|9 months ago
I remember there's a project "Open R1" that last I checked was working on gathering their own list of training material, looks active but not sure how far along they've gotten:
https://github.com/huggingface/open-r1
JKCalhoun|9 months ago
fragmede|9 months ago
https://www.downloadableisnotopensource.org/
aldanor|9 months ago
acheong08|9 months ago
chvid|9 months ago
Hugging face has a leader board and it seems dominated by models that are finetunings of various common open source models, yet don't seem be broader used:
https://huggingface.co/open-llm-leaderboard
z2|9 months ago
swyx|9 months ago
no idea why they cant just wait a bit to coordinate stuff. bit messy in the news cycle.
aibrother|9 months ago
willchen|9 months ago
doctoboggan|9 months ago
therein|9 months ago
modeless|9 months ago
ilaksh|9 months ago
hd4|9 months ago
dyauspitr|9 months ago
esafak|9 months ago
transcriptase|9 months ago
danielhanchen|9 months ago
I'm working on the new one!
terhechte|9 months ago
behohippy|9 months ago
mechagodzilla|9 months ago
hu3|9 months ago
There's already a 685B parameter DeepSeek V3 for free there.
https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free
SkyPuncher|9 months ago
Truthfully, it's just not worth it. You either run these things so slowly that you're wasting your time or you have to buy 4- or 5-figures of hardware that's going to sit, mostly unused.
hadlock|9 months ago
jacob019|9 months ago
jazzyjackson|9 months ago
There's a couple of guides for setting it up "manually" on ec2 instances so you're not paying the Bedrock per-token-prices, here's [1] that states four g6e.48xlarge instances (192 vCPUs, 1536GB RAM, 8x L40S Tensor Core GPUs that come with 48 GB of memory per GPU)
Quick google tells me that g6e.48xlarge is something like 22k USD per month?
[0] https://aws.amazon.com/bedrock/deepseek/
[1] https://community.aws/content/2w2T9a1HOICvNCVKVRyVXUxuKff/de...
unknown|9 months ago
[deleted]
z2|9 months ago
Software: client of choice to https://openrouter.ai/deepseek/deepseek-r1-0528
Sorry I'm being cheeky here, but realistically unless you want to shell out 10k for the equivalent of a Mac Studio with 512GB of RAM, you are best using other services or a small distilled model based on this one.
threeducks|9 months ago
If speed is truly not an issue, you can run Deepseek on pretty much any PC with a large enough swap file, at a speed of about one token every 10 minutes assuming a plain old HDD.
Something more reasonable would be a used server CPU with as many memory channels as possible and DDR4 ram for less than $2000.
But before spending big, it might be a good idea to rent a server to get a feel for it.
whynotmaybe|9 months ago
With an average of 3.6 tokens/sec, answers usually take 150-200 seconds.
karencarits|9 months ago
jsemrau|9 months ago
codedokode|9 months ago
itsmevictor|9 months ago
You can for instance use them to extract some information such as postal codes from strings, or to translate and standardize country names written in various languages (e.g. Spanish, Italian and French to English), etc.
I'm sure people will have more advanced use cases, but I've found them useful for that.
lvturner|9 months ago
sudomarcma|9 months ago
bcoates|9 months ago
danielhanchen|9 months ago
74% smaller 713GB to 185GB.
Use the magic incantation -ot ".ffn_.*_exps.=CPU" to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk.
jacob019|9 months ago
deepsquirrelnet|9 months ago
This whole “building moats” and buying competitors fascination in the US has gotten boring, obvious and dull. The world benefits when companies struggle to be the best.
mjcohen|9 months ago
cropcirclbureau|9 months ago
titaniumtown|9 months ago
AJAlabs|9 months ago
amy_petrik|9 months ago
htrp|9 months ago
overfeed|9 months ago
cesarvarela|9 months ago
edit: most providers are offering a quantized version...
canergly|9 months ago
porphyra|9 months ago
[1] https://console.groq.com/docs/models
techlatest_net|9 months ago
[deleted]
mickey475778|9 months ago
[deleted]
heyhuy|9 months ago
[deleted]
airjason|9 months ago
[deleted]
dubrado|9 months ago
[deleted]
Lopezkatheryn|9 months ago
[deleted]