top | item 44653988

(no title)

sourcecodeplz | 7 months ago

Everyone keeps saying this but it is not really useful. Without a dedicated GPU & VRAM, you are waiting overnight for a response... The MoE models are great but they need dedicated GPU & VRAM to work fast.

discuss

order

jychang|7 months ago

Well, yeah, you're supposed to put in a GPU. It's a MoE model, the common tensors should be on the GPU, which also does prompt processing.

The RAM is for the 400gb of experts.