top | item 40596677

(no title)

Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run it?

Llama-3-8b was garbage for me but damn 70b is good enough

discuss

The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).

So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.

norwalkbear|1 year ago

So a MacBook m series is a decent buy

Manabu-eo|1 year ago

No need for NVLink just for inference, not even with tensor parallelism. And you can get used 3090 much cheaper than that.