top | item 41493915

(no title)

nicoty | 1 year ago

Could the compression efficiency you're seeing somehow be related to 3 being the closest natural number to the number e, which also happens to be the optimal radix choice (https://en.wikipedia.org/wiki/Optimal_radix_choice) for storage efficiency?

discuss

areddyyt|1 year ago

We don't achieve peak compression efficiency because more complex weight unpacking mechanisms kill throughput.

To be more explicit, the weight matrix's values belong to the set of -1, 0, and 1. When using two bits to encode these weights, we are not effectively utilizing one possible state:

10 => 1, 01 => 0, 00 =>-1, 11 => ?

I think selecting the optimal radix economy will have more of a play on custom silicon, where we can implement silicon and instructions to rapidly decompress weights or work with the compressed weights directly.