top | item 44653666

(no title)

sourcecodeplz | 7 months ago

With RAM you would need at least 500gb to load it but some 100-200gb more for context too. Pair it with a 24gb GPU and the speed will be 10t/s, at least, I estimate.

discuss

order

danielhanchen|7 months ago

Oh yes for the FP8, you will need 500GB ish. 4bit around 250GB - offloading MoE experts / layers to RAM will definitely help - as you mentioned a 24GB card should be enough!