top | item 44534330

(no title)

martin_ | 7 months ago

how do you low cost run a 1T param model?

discuss

order

maven29|7 months ago

32B active parameters with a single shared expert.

JustFinishedBSG|7 months ago

This doesn’t change the VRAM usage, only the compute requirements.