top | item 42899424

(no title)

It will be slower for a 70b model since Deepseek is an MoE that only activates 37b at a time. That's what makes CPU inference remotely feasible here.

discuss

No comments yet.