top | item 40078146

(no title)

milansuk | 1 year ago

I don't see any explanation for why they trained 8B instead of 7B. I thought that If you have a 16GB GPU, you can put 14GB(7B*16bits) model into it, but how does it fit If the model is exactly 16GB?

discuss

rileyphone|1 year ago

The bigger size is probably from the bigger vocabulary in the tokenizer. But most people are running this model quantized at least to 8 bits, and still reasonably down to 3-4 bpw.

kristianp|1 year ago

> The bigger size is probably from the bigger vocabulary in the tokenizer.

How does that affect anything? It still uses 16 bit floats in the model doesn't it?

dheera|1 year ago

Upgrade to a 24GB GPU?

JustBreath|1 year ago

Any recommendations?