Demystifying NaN for the Working Programmer

pierrebai|4 years ago

NaN is a cancer. The choice that NaN == Nan being false is just wrong. Every type, every variable can have multiple reason for being invalid. Yet, no other type has ever chosen to make invalid values not being equal to themselves.

Pointers can be invalid. They can be invalid for any number of reason. Lack of memory, object not found, etc. No one ever suggest that null should not equal null.

File handle can be invalid. They can be invalid for any number of reasons: file not found, access denied, file server is offline. No one has ever made invalid handles not being equal to themselves.

The justification for NaN not being equal to themselves is just bonk.

ynik|4 years ago

In a world without generic programming, NaN not being equal to itself makes a certain amount of sense for some kinds of numeric code. But in a world with reusable generic algorithms the calculation changes -- here equality/ordering relations really must be transitive or weird shit happens. In C++ it's undefined behavior to call `std::sort` or `std::unique` on list of floats containing NaN.

Most languages nowadays have standard-library functions/types that require well-behaved equality, so why have a builtin type for which equality is not well-behaved?

Lascaille|4 years ago

>The justification for NaN not being equal to themselves is just bonk.

It makes a lot of sense to me. NaN indicates data has been lost. You did something and you stored the result in a number datatype but the result isn't a number. Data was lost. You lost the data and have only 'your answer wasn't a number.'

Comparing NaN with NaN is asking the computer 'we have two buckets that have overflowed, were their contents the same?' The answer is 'we don't know' which means, to err on the side of safety, the answer is 'no.'

No?

Dylan16807|4 years ago

Let's say you make a particular NaN equal to itself.

But then it's sensible for different operations to give you different NaN values.

And you still wouldn't say that 4 < NaN is true, or NaN < 4 is true, would you?

So it's still going to confuse the user. Is just changing equality going to give you a better system overall?

Fire-Dragon-DoL|4 years ago

Note (without disagreeing). In SQL NULL!= NULL

jameshart|4 years ago

This article conflates the representational limits of floating point with the concept of NaN in a way that I suspect will lead to more confusion, not less.

Zero/zero doesn’t return NaN because it isn’t representable within floating point - it returns NaN because it is an expression that has no mathematical meaning.

The fact that sqrt(-1) has two valid nonreal answers has nothing to do with why it returns NaN - after all, sqrt(4) has two valid real answers so is also technically not representable by a single floating point value, but that doesn’t typically result in NaN.

NaN is just an error value you get when you ask floating point math a dumb question it can’t usefully answer.

Far more interesting and subtle are the ways in which positive and negative infinity and positive and negative zero let you actually still obtain useful (at least for purposes of things like comparison) results to certain calculations even if they overflow the representable range.

saagarjha|4 years ago

> The only reliable way to test for NaN is to use a language-dependent built-in function; the expression a === NaN is always false

Well, you test for it by comparing the value against itself and seeing if that returns false.

(There’s also a bit of confusion on by value vs. by reference comparison and the actual bit value on a NaN, which isn’t quite right.)

ithkuil|4 years ago

Signaling NaNs raise exceptions in some operations. Is comparison one of these?

olliej|4 years ago

I dislike this article, as it tries repeatedly to imply that the use of NaN is somehow a restriction cause by floating point.

No ieee754 ever produces a NaN result unless the operation has no valid result in the set of real values.

Similarly the behaviour in comparisons: if you want NaN to equal NaN you have to come up with a definition of equality that is also consistent with

    NaN < X

    NaN > X

    NaN == X

The logical result of this is that NaN does not equal itself, and I believe mathematicians agree on that definition. Again not a result of the representation, but a result of the mathematical rules of real values.

I want to be very clear here: floating point math always produces the correct value rounded (according to rounding mode) to the appropriate value in the represented space unless it is fundamentally not possible. The only place where floating point (or indeed any finite representation) produces an incorrectly rounded result are the transcendental functions, where some values can only be correctly rounded if you compute the exact value, but the exact value is irrational.

People seem hell bent on complaining about floating point behavior, but it is fundamentally mathematically sound. IEEE754 also specifies some functions like e^x-1 explicitly to ensure that you get the best possible accuracy for the core arithmetic operations

dzaima|4 years ago

greater-than and less-than already make no sense around NaN, you won't get much worse, I don't get what you're trying to point out with them. This is less a question about mathematical correctness (which there isn't much around NaN anyway), but more practical. There being this annoying NaN that breaks everything if its in an array to be sorted or in a set or a key in a map is just pure awful.

bryanrasmussen|4 years ago

I did a code assignment for a potential JavaScript heavy job in 2014, for some reason I think isNaN was part of the language then because I have a memory deciding not to use it (but could be misremembering), at any rate I did Number(x) !== Number(x) at some point.

In the meeting when they went over the code the guy who did it said we were wondering why you did this? So I had to explain NaN to him. He really did not know it existed. At any rate I thought this is a weird thing not to know anything at all about.

pletnes|4 years ago

Related: I’ve met developers who think NaNs are a language or library (notably pandas) feature.

amelius|4 years ago

Imagine doing if(x) ..., where x can be NaN. Shouldn't that throw an exception in most cases? Why are our compilers not doing it that way?

xen0|4 years ago

Should it? It isn't obvious to me at all that throwing an exception in this case is the best behaviour. Throwing an exception when testing a value for 'truthiness' is extremely surprising.

On the other hand, I would strongly discourage 'if(x)' where x is a float that may be NaN purely because the 'correct' behaviour here isn't clear to me.

ElevenLathe|4 years ago

The compiler presumably can't know in most cases, but the runtime might be able to throw. It depends on the language implementation and the tradeoffs.

mrlonglong|4 years ago

Excellent article, this helped me understand the issues working with floating numbers. I work with them quite a lot when developing business logic and often times NaN can be a pain. Understanding why helps a lot.

PopePompus|4 years ago

I love NaNs, especially their "infectious" quality. Initializing float variable to NaNs before first assignment can make a lot of errors immediately obvious. I wish there were a NaN for integers.

colejohnson66|4 years ago

What about a “nullable” double? In C#, you’d use `double?`, Rust would be Option<f64>, C++ would be std::optional<double>. Then any operation would throw upon an unset value?

saagarjha|4 years ago

Don’t initialize them and turn on UBSan :)

36 comments