(no title)
extheat | 1 year ago
So at 32 bit full precision, 70 * (32 / 8) ~= 280GB
fp16, 70 * (16 / 8) ~= 140GB
8 bit, 70 * (8 / 8) ~= 70GB
4 bit, 70 * (4 / 8) ~= 35GB
However in things like llama.cpp quants sometimes it's mixed so some of the weights are Q5, some Q4, etc, so you usually want to take the higher number.
moffkalast|1 year ago