(no title)
mythz | 18 days ago
I've got a lite GLM sub $72/yr which would require 138 years to burn through the $10K M3 Ultra sticker price. Even GLM's highest cost Max tier (20x lite) at $720/yr would buy you ~14 years.
mythz | 18 days ago
I've got a lite GLM sub $72/yr which would require 138 years to burn through the $10K M3 Ultra sticker price. Even GLM's highest cost Max tier (20x lite) at $720/yr would buy you ~14 years.
ljosifov|18 days ago
wongarsu|18 days ago
DeathArrow|18 days ago
Even if you quantize the hell out of the models to fit in the memory, they will be very slow.
oceanplexian|18 days ago
Buy a couple real GPUs and do tensor parallelism and concurrent batch requests with vllm and it becomes extremely cost competitive to run your own hardware.
mythz|18 days ago
No one's running these large models on a Mac Mini.
> Of course if you buy some overpriced Apple hardware it’s going to take years to break even.
Great, where can I find cheaper hardware that can run GLM 5's 745B or Kimi K2.5 1T models? Currently it requires 2x M3 Ultras (1TB VRAM) to run Kimi K2.5 at 24 tok/s [1] What are the better value alternatives?
[1] https://x.com/alexocheema/status/2016404573917683754
retr0rocket|18 days ago
[deleted]