This is currently negative expected value over the lifetime of any hardware you can buy today at a reasonable price, which is basically a monster Mac - or several - until Apple folds and rises the price due to RAM shortages.
$2000 will get you 30~50 tokens/s on perfectly usable quantization levels (Q4-Q5), taken from any one among the top 5 best open weights MoE models. That's not half bad and will only get better!
alwillis|1 month ago
An article about the best open weight models, including Qwen and Kimi K2 [3].
[1]: https://openrouter.ai/models
[2]: https://huggingface.co
[3]: https://simonwillison.net/2025/Jul/30/
baq|1 month ago
master_crab|1 month ago
Maybe in 3-5 years this will work on consumer hardware at speed, but not in the immediate term.
vntok|1 month ago