Llama scout is a 17B x 16 MOE. So that 17B active parameters. That makes it faster to run. But the memory requirements are still large. They claim it fits on an H100. So under 80GB. A mac studio at 96GB could run this. By run i mean inference, Ollama is easy to use for this. 4x3090 nvidia cards would also work but its not the easiest pc build. The tinybox https://tinygrad.org/#tinybox is 15k and you can do Lora fine tuning. Could also do a regular pc with 128gb of ram, but its would be quite slow.
mrajcok|11 months ago
Will have Llama 4 Maverick running in 4bit quantization (typically results in only minor quality degradation) once llama.cpp support is merged.
Total hardware cost well under $50,000.
The 2T Behemoth model is tougher, but enough Blackwell 6000 Pro cards (16) should be able to run it for under $200k.
briandw|11 months ago
latchkey|11 months ago
That said, I'm obviously biased but you're probably better off renting it.
hhh|11 months ago
unknown|11 months ago
[deleted]