top | item 47020764

AXS-6: Adaptive EXponent Sharing – 6-Bit Training Format

2 points| AkaiNa | 15 days ago |github.com

5 comments

order

p1esk|15 days ago

That’s just good old block floating point, right?

https://en.wikipedia.org/wiki/Block_floating_point

AkaiNa|15 days ago

Yes. But standard block floating point uses a linear grid scaled by a shared exponent. Whereas AXS-6 uses a NormalFloat grid scaled by a shared exponent to maximize information density for bell-curve distributed weights. Essentially a Block Scaled Normalfloat-5.