top | item 44118818

Deepseek R1-0528

451 points| error404x | 9 months ago |huggingface.co

250 comments

order

jacob019|9 months ago

Well that didn't take long, available from 7 providers through openrouter.

https://openrouter.ai/deepseek/deepseek-r1-0528/providers

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model.

jazzyjackson|9 months ago

No sign of what source material it was trained on though right? So open weight rather than reproducible from source.

I remember there's a project "Open R1" that last I checked was working on gathering their own list of training material, looks active but not sure how far along they've gotten:

https://github.com/huggingface/open-r1

JKCalhoun|9 months ago

Is there a downloadable model? (Not familiar with openrouter and not seeing the model on ollama.)

aldanor|9 months ago

Open weights.

acheong08|9 months ago

No information to be found about it. Hopefully we get benchmarks soon. Reminds me of the days when Mistral would just tweet a torrent magnet link

chvid|9 months ago

Benchmarks seem like a fools errand at this point; overly tuning models just to specific test already published tests, rather than focusing on making them generalize.

Hugging face has a leader board and it seems dominated by models that are finetunings of various common open source models, yet don't seem be broader used:

https://huggingface.co/open-llm-leaderboard

z2|9 months ago

There's a table here showing some "Overall" and "Median" score, but no context on what exactly was tested. It appears to be in the ballpark as the latest models, but with some cost advantages with the downside of being just as slow as the original r1 (likely lots of thinking tokens). https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

swyx|9 months ago

i think usually deepseek posts a paper after a model release about a day later.

no idea why they cant just wait a bit to coordinate stuff. bit messy in the news cycle.

aibrother|9 months ago

getting a similar vibe yeah. given how adjacent they are, wouldn't be surprised if this was an intentional nod from DeepSeek

willchen|9 months ago

I love how Deepseek just casually drops new updates (that deliver big improvements) without fanfare.

doctoboggan|9 months ago

Honest question, how do you know this is a big improvement? Are there any benchmarks anywhere?

therein|9 months ago

Much more preferred to what OpenAI always did and Anthropic recently started doing. Just write some complicated narrative about how scary this new model is and how it tried to escape and deceive and hack the mainframe while telling the alignment operators bed time stories.

modeless|9 months ago

I like it too, but some benchmark numbers would be nice at least.

ilaksh|9 months ago

I think they did make an announcement on WeChat.

hd4|9 months ago

On the day Nvidia report earnings too. Pretty sure it's just a coincidence, bro.

dyauspitr|9 months ago

What big improvements?

esafak|9 months ago

Anyone got benchmarks?

transcriptase|9 months ago

Out of sheer curiosity: What’s required for the average Joe to use this, even at a glacial pace, in terms of hardware? Or is it even possible without using smart person magic to append enchanted numbers and make it smaller for us masses?

terhechte|9 months ago

You can run the 4bit quantized version of it on a M3 Ultra 512GB. That's quite expensive though. Another alternative is a fast CPU with 500GB of DDR5 RAM. That of course, is also not cheap and slower than the M3 Ultra. Or, you buy multiple Nvidia cards to reach ~500GB of VRam. That is probably the most expensive option but also the fastest

behohippy|9 months ago

About 768 gigs of ddr5 RAM in a dual socket server board with 12 channel memory and an extra 16 gig or better GPU for prompt processing. It's a few grand just to run this thing at 8-10 tokens/s

mechagodzilla|9 months ago

I have a $2k used dual-socket xeon with 768GB of DDR4 - It runs at about 1.5 tokens/sec for the 4-bit quantized version.

SkyPuncher|9 months ago

Practically, smaller, quantized versions of R1 can be run on a pretty typically Macbook Pro setup. Quantized versions are definitely less performant, but they will absolutely run.

Truthfully, it's just not worth it. You either run these things so slowly that you're wasting your time or you have to buy 4- or 5-figures of hardware that's going to sit, mostly unused.

hadlock|9 months ago

As mentioned you can run this on a server board with 768+ gb memory in cpu mode. Average joe is going to be running quantized 30b (not 600b+) models on an $300/$400/$900 8/12/16gb GPU

jacob019|9 months ago

I'm sure it will be on OpenRouter within the next day or so. Not really practical to run a 685B param model at home.

jazzyjackson|9 months ago

You can pay Amazon to do it for you at about a penny per 10 thousand tokens.

There's a couple of guides for setting it up "manually" on ec2 instances so you're not paying the Bedrock per-token-prices, here's [1] that states four g6e.48xlarge instances (192 vCPUs, 1536GB RAM, 8x L40S Tensor Core GPUs that come with 48 GB of memory per GPU)

Quick google tells me that g6e.48xlarge is something like 22k USD per month?

[0] https://aws.amazon.com/bedrock/deepseek/

[1] https://community.aws/content/2w2T9a1HOICvNCVKVRyVXUxuKff/de...

z2|9 months ago

Hardware: any computer from the last 20 or so years.

Software: client of choice to https://openrouter.ai/deepseek/deepseek-r1-0528

Sorry I'm being cheeky here, but realistically unless you want to shell out 10k for the equivalent of a Mac Studio with 512GB of RAM, you are best using other services or a small distilled model based on this one.

threeducks|9 months ago

> even at a glacial pace

If speed is truly not an issue, you can run Deepseek on pretty much any PC with a large enough swap file, at a speed of about one token every 10 minutes assuming a plain old HDD.

Something more reasonable would be a used server CPU with as many memory channels as possible and DDR4 ram for less than $2000.

But before spending big, it might be a good idea to rent a server to get a feel for it.

whynotmaybe|9 months ago

I'm using GPT4All with DeepSeek-R1-Distill-QWen-7B (which is not R1-0528) on a Ryzen 5 3600 with 32Gb ram.

With an average of 3.6 tokens/sec, answers usually take 150-200 seconds.

karencarits|9 months ago

What use cases are people using local LLMs for? Have you created any practical tools that actually increase your efficiency? I've been experimenting a bit but find it hard to get inspiration for useful applications

jsemrau|9 months ago

I have a signal tracer that evaluates unusual trading volumes. Given those signals, my local agent receives news items through API to make an assessment what happens. This helps me tremendously. If I would do this through a remote app, I'd have to spend a several dollars per day. So I have this on existing hardware.

codedokode|9 months ago

Anyone who does not want to leak their data? I am actually surprised that people are ok with trusting their secrets to a random foreign company.

itsmevictor|9 months ago

I do a lot of data cleaning as part of my job, and I've found that small models could be very useful for that, particularly in the face of somewhat messy data.

You can for instance use them to extract some information such as postal codes from strings, or to translate and standardize country names written in various languages (e.g. Spanish, Italian and French to English), etc.

I'm sure people will have more advanced use cases, but I've found them useful for that.

lvturner|9 months ago

Also worth it for the speed of AI autocomplete in coding tools, the round trip to my graphics card is much faster than going out over the network.

sudomarcma|9 months ago

Any companies with any type of sensitive data will love to have anything to do with LLM done locally.

bcoates|9 months ago

I use the local LLM-based autocomplete built into PyCharm and I'm pretty happy with it

danielhanchen|9 months ago

For those interested, I made some 1 bit dynamic quants at https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF

74% smaller 713GB to 185GB.

Use the magic incantation -ot ".ffn_.*_exps.=CPU" to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk.

jacob019|9 months ago

Not much to go off of here. I think the latest R1 release should be exciting. 685B parameters. No model card. Release notes? Changes? Context window? The original R1 has impressive output but really burns tokens to get there. Can't wait to learn more!

deepsquirrelnet|9 months ago

I think it’s cool to see this kind of international participation in fierce tech competition. It’s exciting. It’s what I think capitalism should be.

This whole “building moats” and buying competitors fascination in the US has gotten boring, obvious and dull. The world benefits when companies struggle to be the best.

mjcohen|9 months ago

Deepseek seems to be one of the few LLMs that run on a iPod Touch because of the older version of ios.

cropcirclbureau|9 months ago

Hey! You! You can't just say that and not explain. Come back.

AJAlabs|9 months ago

671B parameters! Well, it doesn't look like I'll be running that locally.

amy_petrik|9 months ago

there is a small community of people that do indeed run this locally. typically on CPU/RAM (lots and lots of RAM), insofar as that's cheaper than GPU(s).

htrp|9 months ago

You're gonna need at least 8 h100 80s for this....

overfeed|9 months ago

That's about $16-24 per hour - depending on the number of tokens you're slinging in that period, it may be much cheaper than paying OpenAI for similar functionality.

cesarvarela|9 months ago

About half the price of o4 mini high for not that much worse performance, interesting

edit: most providers are offering a quantized version...

canergly|9 months ago

I want to see it in groq asap !

porphyra|9 months ago

Groq doesn't even have any true deepseek models --- I thought they only had `deepseek-r1-distill-llama-70b` which was distilled onto llama 70b [1].

[1] https://console.groq.com/docs/models