top | item 40078262 (no title) rileyphone | 1 year ago The bigger size is probably from the bigger vocabulary in the tokenizer. But most people are running this model quantized at least to 8 bits, and still reasonably down to 3-4 bpw. discuss order hn newest kristianp|1 year ago > The bigger size is probably from the bigger vocabulary in the tokenizer.How does that affect anything? It still uses 16 bit floats in the model doesn't it?
kristianp|1 year ago > The bigger size is probably from the bigger vocabulary in the tokenizer.How does that affect anything? It still uses 16 bit floats in the model doesn't it?
kristianp|1 year ago
How does that affect anything? It still uses 16 bit floats in the model doesn't it?