top | item 41083863

(no title)

trailbits | 1 year ago

Even non-physical numbers are problematic to signal 'invalid'. I had a customer use -999 as a placeholder for 'invalid' data. Years later somebody made a higher level data product that averaged and combined that data with other products, without knowing to first remove those 'invalid' values. The resulting values were all now within physical limits, but very very wrong. The best solution is to use IEEE NaN https://en.wikipedia.org/wiki/NaN so that your code blows up if you don't explicitly check for it.

discuss

order

gizmo686|1 year ago

NaN is a sentinel value, just as much as 2,147,483,647 is

The only difference is that NaN is implemented in hardware. However, taking advantage of that requires using the hardware arithmetic that recognizes NaN, which restricts you to floating point numbers, and all the problems that introduces.

If you have good language support and can afford the overhead, you want to replicate that behavior in the type system as some sort of tagged union:

    data SentinelInt32 = NAN | Int32
Or, more likely, using the equivalent of Optional<T> that is part of your languages standard library.

Of course, this means boxing all of your numbers. You could also do something like:

  type SentinelInt32 = Int32
Then provide alternative arithmetic implementations that check for your Sentinel value(s) and propagate the appropriately. This avoids the memory overhead, but still adds in all the conditional overhead.

HdS84|1 year ago

999 or 9999 etc. are extremely common in traditional statistics, especially because there is no known good sentinel value. In many cases I wished that they used the maximum value as a sentinel, e.g. take 255 for a short as invalid and make only -244 to +244 normal numbers.

addaon|1 year ago

As someone else who regularly uses 8.93 bit words for computations, I understand completely.