top | item 45837535

(no title)

Where do you run a trillion-param model?

discuss

Gracana|3 months ago

If you want to do it at home, ik_llama.cpp has some performance optimizations that make it semi-practical to run a model of this size on a server with lots of memory bandwidth and a GPU or two for offload. You can get 6-10 tok/s with modest hardware workstation hardware. Thinking chews up a lot of tokens though, so it will be a slog.

simonw|3 months ago

What kind of server have you used to run a trillion parameter model? I'd love to dig more into this.

isoprophlex|3 months ago

You let the people at openrouter worry about that for you

MurizS|3 months ago

Which in turn lets the people at Moonshot AI worry about that for them, the only provider for this model as of now.

skeptrune|3 months ago

Good people over there