top | item 46491570

(no title)

jckahn | 1 month ago

Alternatively, just use a local model with zero restrictions.

discuss

alwillis|1 month ago

The next best thing is to use the leading open source/open weights models for free or for pennies on OpenRouter [1] or Huggingface [2].

An article about the best open weight models, including Qwen and Kimi K2 [3].

[1]: https://openrouter.ai/models

[2]: https://huggingface.co

[3]: https://simonwillison.net/2025/Jul/30/

baq|1 month ago

This is currently negative expected value over the lifetime of any hardware you can buy today at a reasonable price, which is basically a monster Mac - or several - until Apple folds and rises the price due to RAM shortages.

master_crab|1 month ago

This requires hardware in the tens of thousands of dollars (if we want the tokens spit out at a reasonable pace).

Maybe in 3-5 years this will work on consumer hardware at speed, but not in the immediate term.

vntok|1 month ago

$2000 will get you 30~50 tokens/s on perfectly usable quantization levels (Q4-Q5), taken from any one among the top 5 best open weights MoE models. That's not half bad and will only get better!