top | item 41494438

(no title)

areddyyt | 1 year ago

We don't achieve peak compression efficiency because more complex weight unpacking mechanisms kill throughput.

To be more explicit, the weight matrix's values belong to the set of -1, 0, and 1. When using two bits to encode these weights, we are not effectively utilizing one possible state:

10 => 1, 01 => 0, 00 =>-1, 11 => ?

I think selecting the optimal radix economy will have more of a play on custom silicon, where we can implement silicon and instructions to rapidly decompress weights or work with the compressed weights directly.

discuss

No comments yet.