top | item 41788037

(no title)

ericlewis | 1 year ago

Higher the precision the better. Use what works within your memory constraints.

discuss

order

jasonjmcghee|1 year ago

With serious diminishing returns. At inference time, no reason to use fp64 and should probably use fp8 or less. The accuracy loss is far less than you'd expect. AFAIK Llama 3.2 3B fp4 will outperform Llama 3.2 1B at fp32 in accuracy and speed, despite 8x precision.