top | item 43712100

(no title)

  BitNet: Scaling 1-bit Transformers for Large Language Models

was actually binary (weights of -1 or 1),

but then in the follow-up paper they started using 1.58bit weights (https://arxiv.org/pdf/2402.17764)

  The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

This seems to be first source of the confounding of "1-bit LLM" and ternary weights that I could find.

  In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}.

discuss

LeonB|10 months ago

It’s “1-bit, for particularly large values of ‘bit’”