top | item 47204381

(no title)

OkWing99 | 20 hours ago

Can someone who has done this, simplify and say what specs we need on a `local computer` to run and test this, with a reasonable speed?

Excluding MBP M5 128GB.

discuss

regularfry|13 hours ago

I've got the unsloth q4_K_XL 35b running in llama.cpp on an i9/64G/4090 machine doing double-digit tokens per second with a 90k+ token context window available. The model's completely in VRAM.

chvid|19 hours ago

It is slow but usable via opencode on a mbp m3 max 48 gb. So I guess hosted is still the better option for most people.

The local models are considerably better relative to the hosted ones compared to 6 months ago. Bench maxing or not - stuff is happening in this area for sure.