top | item 46360718

(no title)

I've been running the 'frontier' open-weight LLMs (mainly deepseek r1/v3) at home, and I find that they're best for asynchronous interactions. Give it a prompt and come back in 30-45 minutes to read the response. I've been running on a dual-socket 36-core Xeon with 768GB of RAM and it typically gets 1-2 tokens/sec. Great for research questions or coding prompts, not great for text auto-complete while programming.

discuss

christina97|2 months ago

Let's say 1.5tok/sec, and that your rig pulls 500 W. That's 10.8 tok/Wh, and assuming you pay, say 15c/kWh means you're paying in the vicinity of $13.8/mtok of output. Looking at R1 output costs on OpenRouter, it's costing about 5-7x as much as what you can pay for third party inference (which also produce tokens ~30x faster).

tyre|2 months ago

Given the cost of the system, how long would it take to be less expensive than, for example, a $200/mo Claude Max subscription with Opus running?

mechagodzilla|2 months ago

It's not really an apples-to-apples comparison - I enjoy playing around with LLMs, running different models, etc, and I place a relatively high premium on privacy. The computer itself was $2k about two years ago (and my employer reimbursed me for it), and 99% of my usage is for research questions which have relatively high output per input token. Using one for a coding assistant seems like it can run through a very high number of tokens with relatively few of them actually being used for anything. If I wanted a real-time coding assistant, I would probably be using something that fit in the 24GB of VRAM and would have very different cost/performance tradeoffs.

Workaccount2|2 months ago

Never, local models are for hobby and (extreme) privacy concerns.

A less paranoid and much more economically efficient approach would be to just lease a server and run the models on that.

dimava|2 months ago

Tokens will cost same on Mac and on API because electricity is not free

And you can only generate like $20 of tokens a month

Cloud tokens made on TPU will always be cheaper and waaay faster then anything you can make at home

oceanplexian|2 months ago

It doesn't matter if you spend $200, $20,000, or $200,000 a month on an Anthropic Subscription.

None of them will keep your data truly private and offline.