top | item 41890352

(no title)

Randor | 1 year ago

The energy claims up to ~70% can be verified. The inference implementation is here:

discuss

kayo_20211030|1 year ago

I'm not an AI person, in any technical sense. The savings being claimed, and I assume verified, are on ARM and x86 chips. The piece doesn't mention swapping mult to add, and a 1-bit LLM is, well, a 1-bit LLM.

Also,

> Additionally, it reduces energy consumption by 55.4% to 70.0%

With humility, I don't know what that means. It seems like some dubious math with percentages.

sroussey|1 year ago

Not every instruction on a CPU or GPU uses the same amount of power. So if you could rewrite your algorithm to use more power efficient instructions (even if you technically use more of them), you can save overall power draw.

That said, time to market has been more important than any cares of efficiency for some time. Now and in the future, there is more of a focus on it as the expenses in equipment and power have really grown.

Randor|1 year ago

> I don't know what that means. It seems like some dubious math with percentages.

I would start by downloading a 1.58 model such as: https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens

Run the non-quantized version of the model on your 3090/4090 gpu and observe the power draw. Then load the 1.58 model and observe the power usage. Sure, the numbers have a wide range because there are many gpu/npu to make the comparison.

littlestymaar|1 year ago

How does the liked article relate to BitNet at all? It's about the “addition is all you need” paper which AFAIK is unrelated.

Randor|1 year ago

Yeah, I get what you're saying but both are challenging the current MatMul methods. The L-Mul paper claims "a power savings of 95%" and that is the thread topic. Bitnet proves that at least 70% is possible by getting rid of MatMul.