(no title)
fc417fc802 | 20 hours ago
You can spill to RAM in which case you at least want enough for a single active expert but really that's going to tank performance. If you're only "a bit" short of the full model the difference might not be all that large.
These things are memory bandwidth limited so if you check out RAM, VRAM, and PCIe bandwidth what I wrote above should make sense.
Also you should just ask your friendly local LLM these sorts of questions.
No comments yet.