(no title)
AzN1337c0d3r | 10 months ago
BitNet: Scaling 1-bit Transformers for Large Language Models
was actually binary (weights of -1 or 1),but then in the follow-up paper they started using 1.58bit weights (https://arxiv.org/pdf/2402.17764)
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
This seems to be first source of the confounding of "1-bit LLM" and ternary weights that I could find. In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}.
LeonB|10 months ago