(no title)
btbuildem | 18 days ago
I guess that's debatable. I regularly run out of quota on my claude max subscription. When that happens, I can sort of kind of get by with my modest setup (2x RTX3090) and quantized Qwen3.
And this does not even account for privacy and availability. I'm in Canada, and as the US is slowly consumed by its spiral of self-destruction, I fully expect at some point a digital iron curtain will go up. I think it's prudent to have alternatives, especially with these paradigm-shattering tools.
jsheard|18 days ago
That's like ten normal computers worth of power for the GPUs alone.
bigyabai|18 days ago
Maybe if your "computer" in question is a smartphone? Remember that the M3 Ultra is a 300w+ chip that won't beat one of those 3090s in compute or raster efficiency.
dymk|18 days ago
kataklasm|18 days ago
wongarsu|18 days ago
But if you have to factor in hardware costs self-hosting doesn't seem attractive. All the models I can self-host I can browse on openrouter and instantly get a provider who can get great prices. With most of the cost being in the GPUs themselves it just makes more sense to have others do it with better batching and GPU utilization
zozbot234|18 days ago
int_19h|17 days ago
sheepscreek|17 days ago
I hope too many of us won't be doing this and cause Google to add limits! My hope is Google sees the benefit in this and goes all in - continues to let people decide which Google hosted model to use, including their own.
mythz|18 days ago
I've got a lite GLM sub $72/yr which would require 138 years to burn through the $10K M3 Ultra sticker price. Even GLM's highest cost Max tier (20x lite) at $720/yr would buy you ~14 years.
ljosifov|18 days ago
wongarsu|18 days ago
DeathArrow|18 days ago
Even if you quantize the hell out of the models to fit in the memory, they will be very slow.
oceanplexian|18 days ago
Buy a couple real GPUs and do tensor parallelism and concurrent batch requests with vllm and it becomes extremely cost competitive to run your own hardware.
retr0rocket|18 days ago
[deleted]
Aurornis|18 days ago
When talking about fallback from Claude plans, The correct financial comparison would be the same model hosted on OpenRouter.
You could buy a lot of tokens for the price of a pair of 3090s and a machine to run them.
bigyabai|18 days ago
That's a subjective opinion, to which the answer is "no you can't" for many people.
visarga|18 days ago
tw1984|18 days ago
you can't be a happy uber driver making more money in the next 24 months by having a fancy car fitted with the best FSD in town when all cars in your town have the same FSD.
benterix|18 days ago
Could you elaborate? I fail to grasp the implication here.
dymk|18 days ago
7thpower|18 days ago
Doesn’t mean you shouldn’t do it though.
flaviolivolsi|18 days ago
Aurornis|18 days ago
They can do a lot of simple tasks in common frameworks well. Doing anything beyond basic work will just burn tokens for hours while you review and reject code.
btbuildem|18 days ago
unknown|18 days ago
[deleted]