top | item 43596856

(no title)

growdark | 11 months ago

Would it be realistic to buy and self-host the hardware to run, for example, the latest Llama 4 models, assuming a budget of less than $500,000?

discuss

order

mrajcok|11 months ago

Yes - I'm able to run Llama 3.1 405B on 3x A6000 + 3x 4090.

Will have Llama 4 Maverick running in 4bit quantization (typically results in only minor quality degradation) once llama.cpp support is merged.

Total hardware cost well under $50,000.

The 2T Behemoth model is tougher, but enough Blackwell 6000 Pro cards (16) should be able to run it for under $200k.

briandw|11 months ago

Llama scout is a 17B x 16 MOE. So that 17B active parameters. That makes it faster to run. But the memory requirements are still large. They claim it fits on an H100. So under 80GB. A mac studio at 96GB could run this. By run i mean inference, Ollama is easy to use for this. 4x3090 nvidia cards would also work but its not the easiest pc build. The tinybox https://tinygrad.org/#tinybox is 15k and you can do Lora fine tuning. Could also do a regular pc with 128gb of ram, but its would be quite slow.

latchkey|11 months ago

A box of AMD MI300x (1.5TB of memory) is much less than $500k and AMD made sure to have day zero support with vLLM.

That said, I'm obviously biased but you're probably better off renting it.

hhh|11 months ago

You can do it with regular gpus for less