top | item 40596677

(no title)

norwalkbear | 1 year ago

Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run it?

Llama-3-8b was garbage for me but damn 70b is good enough

discuss

order

reaperman|1 year ago

The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).

So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.

norwalkbear|1 year ago

So a MacBook m series is a decent buy

Manabu-eo|1 year ago

No need for NVLink just for inference, not even with tensor parallelism. And you can get used 3090 much cheaper than that.