top | item 23084039

(no title)

Ackchyually...

The IEEE-754 has a lot of redundant representation. Not where you would expect though.

Caveat: Those features are invaluable for some niche applications, but not for the average joe.

To start. Every IEEE-754 float has two zero representation: one for positive zero and another negative negative zero (sic).

The special numbers are another source of redundancy. The the double format, have about 9,007,199,254,740,992 different combination to encode three different states that a production ready software shouldn't reach: NaN, +inf and -inf.

Other than the redundancy, the double have many rarely used combination. For instance, the subnormals representation. Unless you are compiling your program with -O0 or with some exotic compiler, they are disabled by default. One subnormal operation can take over a hundread of cycles to complete. Therefore, more 9,007,199,254,740,992 wasted combination.

If that wasn't bad enough, since the magnitude of the numbers follows a normal distribution (someone whose name I forgot's law), the most significant bits of the exponent field are very rarely used. The IEEE-754 encoding is suboptimal.

The posit floating point address all those issues. It uses an tapered encoding for the exponent field.

discuss

thechao|5 years ago

I'm mixed on Gustafson's posit stuff. For me, the only thing I'd change for fp would be:

1. -0 now encodes NAN.

2. +inf/-inf are all Fs with sign: 0x7FFFFFFF, 0xFFFFFFFF.

3. 0 is the only denorm.

Which does four good things:

1. Gets rid of the utter insanity which is -0.

2. Gets rid of all the redundant NANs.

3. Makes INF "look like" INF.

4. Gets rid of "hard" mixed denorm/norm math.

And one seriously bad thing:

1. Lose a bunch of underflow values in the denorm range.

However, as to the latter: who the fuck cares! Getting down to that range using anything other than divide-by-two completely trashes the error rate anyways, so why bother?

The rest of Gustafson's stuff always sounds like crazy-people talk, to me.

piadodjanho|5 years ago

He also propose the use of an opaque register to accumulate (quire), in contrast to the transparent float register (its a mess, each compiler does what it think is best).

When working with numbers that exceed the posit representation you use the quire to accumulate. At the end of the computation you convert again to posit to store in memory, or store the quire in memory.

In C, it would look like something like:

    posit32_r a, b;
    quire_t q;
    
    q = a; // load posit into quire
    
    q = q + b; // accumulate in quire
    
    a = q; // load quire into posit

> The rest of Gustafson's stuff always sounds like crazy-people talk, to me.

I've read all his papers on posit and agree. But I do believe the idea of encoding exponent with golomb-rice is actually very good and suit most users. The normalization hardware (used in the subtraction operation) can be easily repurposed to decode the exponent and shift the exponent.

But the quire logic (fixed point arithmetic) might use more area than a larger float-point. But maybe in power usage it pays of.

cokernel_hacker|5 years ago

Interesting. One issue is treatment of 1 / -inf. This would be -0 in traditional IEEE 754 but would now be +0 IIUC.

This would imply that 1 / (1 / -inf) would now be +inf instead of -inf.

BeetleB|5 years ago

> If that wasn't bad enough, since the magnitude of the numbers follows a normal distribution (someone whose name I forgot's law), the most significant bits of the exponent field are very rarely used. The IEEE-754 encoding is suboptimal.

But isn't that accounted for by the fact the floating point number distribution is non-uniform? Half of all floating point numbers are between -1 and 1.

piadodjanho|5 years ago

Hm. I don't know.

My reasoning is about how much information can be encoded in the format.

The IEEE-754 double format have 11 bits to encode the exponent and 52 bits to encode the fraction.

Therefore, the multiplying factor from double is in the range: 2^1023 to 2^-1022. To give an idea how large this is, the scientist estimate there are about 10^80 atoms in universe, in base 2 this is "little" less than 2^266.

Most application only don't work with numbers on this magnitude. And the ones that does, don't care so much about precision.

Let me know if there is something wrong with my logic.