top | item 41908315

(no title)

seg_fault | 1 year ago

Actually you can specify the numeric limits of the mantissa and the exponent. They can be specified as template arguments[0]. So you could do:

      Float<uint8_t, // type of the mantissa
            uint8_t, // type of the exponent
            0,       // lowest possible value of the mantissa
            4095,    // highest possible value of the mantissa
            0,       // lowest possible value of the exponent
            7>       // highest possible value of the exponent
The Float then simulates an unsigned 12bit mantissa and a 3bit exponent. Sure it still takes 16 bytes. But you could create a union with bitfields where you shrink that even further.

[0] https://github.com/clemensmanert/fas/blob/58f9effbe6c13ab334...

discuss

order

Archit3ch|1 year ago

Can you go in the other direction? Higher exponent and mantissa than regular float/double?

seg_fault|1 year ago

Sure.

    Float<int64_t, int64_t>
Gives you a signed Mantissa with 64 bit and a signed Exponent with 64bit. Since there are numeric limits for int64_t available, Float knows the max and the min value.

You could get even bigger ranges for Float by implementing your own big integer type.