top | item 46214468

I eliminated matrix multiplication from transformers using 1965 Soviet research

5 points| ZaneHam | 2 months ago |zenodo.org

3 comments

ZaneHam|2 months ago

Author here, I've been collecting historical computing documentation for a few years and found Brusentsov's balanced ternary research from Moscow State University (1958-1965). Applied it to modern transformers.

Some interesting results:

93.8% energy reduction per inference, 16x memory compression (7B model: 28GB → 1.75GB), Zero floating-point multiplication, Runs on CPUs, no GPU required and Architectural epistemic uncertainty (it won't hallucinate what it doesn't know)

Repo: https://github.com/Zaneham/Ternary_inference

Happy to answer questions :-) Happy holidays and merry christmas!

mika6996|2 months ago

Did you try this method on any model? What do benchmarks say?