If you're reaching these values then it's extremely likely that either:
(a) You're doing something wrong (usually, not standardizing your data, etc.) which means that you're not thinking about the fact that computers have finite numerical precision and adjusting your problem accordingly (e.g., have a wide dynamic range of numbers).
Or, (b) your problem is pretty ill-conditioned and there's probably no solving it in any useful way unless you crank up the precision.
Almost every problem I've encountered of this form (whenever I've worked on optimizers, or, more generally, solving problems) is of type (a). There aren't many more choices if your problem is of type (b), since you're almost always losing digits somewhere in any implementation of a numerical algorithm.
The mean of two numbers is a particularly easy example to analyze, but I don't think that the implementation is "wrong" since every (even slightly more complicated) numerical algorithm will run into the same problems. I think this is a good post in its content, I just don't agree with the author's final interpretation.
Most of the issues discussed in the article are not from limits in precision, they're from overflows that arise from limits in the exponential range. Finding the average of 8e+307 and 8e+307 is a "low precision" problem, and even naive methods to avoid overflow will not hit limits on precision in this case (e.g. 8e+307/2 + 8e+307/2).
You're right that there are issues with precision and floating point math (the article touches on this a little), but the implementation is definitely wrong if it promises to work like in the case of Python's statistics library (which was correctly using the fractions library to store arbitrary numbers, but converted to a fraction in the wrong order).
For problem (b), this is the standard case for inverse problems, which are almost always ill-posed / ill-conditioned. There is now a vast literature on this subject, including many algorithms for solving them robustly.
"...incorrectly posed problems were considered ‘taboo’ for generations of mathematicians, until comparatively recently
it became clear that there are a number of quite meaningful problems, the so-called ‘inverse problems,’ which are nearly always unstable with respect to fluctuations of input data."
— H. Allison, Inverse Unstable Problems and Some of Their Applications (1979)
An interesting article describing a useful framework.
In addition to the largest numbers that floating point can handle, a related issue can come up with numbers that are not near the edge.
A good student of floating point numbers will notice that there is a good chance you will get a different result summing a list of floating point numbers if they are sorted in a different order. This is due to the rounding error that can happen if you add a large number to a smaller one. Thus, if you add up all the small numbers first, you can get an intermediate sum that is large enough to not overflow when added to the next on the list. Thus, you should sort an array of floating point numbers and begin the summation with the smallest.
Some careful thought is needed if numbers are important to your computation. While the author of this article makes important points about one aspect of this, they seem to shrug off further study of the referenced 30 page paper at https://hal.archives-ouvertes.fr/file/index/docid/576641/fil....
The authors of that paper, however, seems to misstate one vital point. They note that the mathematical formulas are "unstable", but to the point of working with floating point numbers, it isn't the mathematics that is unstable, it is the failure to understand how deeply modern floating point numbers are an approximation. Methods that fail to take this into account will be unstable.
A solution more accurate than sorting before adding is to place your numbers into a priority queue and repeatedly add the smallest two numbers and re-insert the result into the queue. This helps handle the case where you have many similarly valued numbers and your running sum becomes large enough relative to your numbers to cause the same rounding errors.
It's amazing how many floating point edge cases emerge from mid-range values. Ordering can also help minimize the accumulation of rounding errors for similar reasons to those you pointed out.
I ran into a similar issue doing high accuracy dot-products. The solution was Kahan summation [1], which first adds the incremental value, then checks the result to see how much floating point error was introduced, then rolls that error into the next addition.
I suspect something similar could be done here, even if it would be a few times slower, to get an ulp-accurate mean regardless of the order/size of the individual elements.
This is the TXR Lisp interactive listener of TXR 214.
Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
1> (defun mean (seq) (/ (sum seq) (len seq)))
mean
2> (mean '(8.988465674311579e+307 8.98846567431158e+307))
** out-of-range floating-point result
** during evaluation at expr-1:1 of form (sum seq)
Works for me; error is caught. Breaking news: floating-point has a limited range! Detailed story at 11 ...
Almost any non-trivial calculation that overflows can be massaged into something that avoids overflow. One important motivation behind floating point is to have enough practical range so that you don't run into overflow.
If this is an actual practical problem someone faces, the tool to reach for is wider floating-point, not to mess with the code. A bigger "E" than +307 is a thing.
He shows off a function named "sumsto" which takes a single positive double precision floating point number "x" as an argument.
"sumsto" always returns the same vector of 2046 floating point numbers -- just in an order so that naive left-to-right summation returns "x".
Just changing the order of that vector lets you sum to almost any positive double precision number.
If you want to run the code, I didn't see where "realmax" was defined, but this works:
You think that's bad? Theoretical math has "conditionally convergent series'" [1] Sum in order and it can convergence to a given value. Rearrange the terms and any real number is possible.
I have a hunch there could be a bit of relation between these things.
What about a "reduce" technique? Average the numbers in equal-sized chunks, then average those averages. You could even chunk the chunk averages and repeat the process as many levels down as you want to, and chunks could be as small as 2 each.
I guess this still assumes that the largest number in the original list is less than or equal to the maximum floating point value, but otherwise you stay roughly in the same space of precision as the original data.
Even better if you can afford to sort the data and use a high precision type for mu. The algorithm will not be online any more and for python real numbers are usually double in practice. But in a constraint system where one has to work with float32, just using float64 for mu helps.
This article is both extremely insightful, and overly pedantic.
If you are not aware of how floating point value work, this is an excellent illustration of that, and you should spend some time learning about the risks and edge cases associated with working with them.
At the same time, enabling floating point exceptions on your platform is probably the most pragmatic way of dealing with these issues. until you are in a domain where this level of effort is an investment, over-engineering modules that use floating points is a waste of time.
This is one of the many reasons multiprecision floating point libraries exist.
Most people would probably be better off with the cost of using them than dealing with any of this.
I wrote a Postgres extension to compute the mean of an array of floats. [0] I tried to avoid overflow problems by using an approach that finds the final mean for each element as it goes along. [1] But that is a lot of divisions, so there is a disappointing perf hit. (It's still a lot faster than SQL though. :-) I'd love to know if there is a faster way that is still correct. (I have some implementations for SSE2 and AVX that aren't published yet, but they still do all the divisions; what I really want is a different algorithm.)
The reason exponents have so many bits is basically to solve this problem. The exponent/mantissa bits of IEEE 754 double could have been 8 and 55, but I suppose the standard committee decided that some precision should be sacrificed to allow stupid values outside the range of meaningful physical numbers so that you never have to worry about the problem mentioned in this article. So if you pretend that doubles have 8 exponent bits as they probably should have been, then `sum(list) / len(list)` will always work for this set of "sane" values.
And if that isn't enough, you have subnormal numbers as a safety net as well, although using these kills performance on many FPUs.
For the probably simplest solution to the given problem if summing doubles there is a hardware in every x86 CPU which allows the biggest number during the computation to be 1.18e4932 (using 80 bits internally, 15 from that for the exponent, vs. only 11 in the 64-bit double). By just activating it the sum can be done very fast, and no need for any modification: first sum, then divide, convert to double at the end.
You should not use SSE for that however, but the old x87 "FPU" instructions.
And if you have to use Microsoft or Intel C compiler for 64-bits Windows, you have to use assembly to reach that functionality which is still in the CPU. With the open source compilers it's better. You can write C code that works with both GCC and clang.
precision = Float::DIG + 1
require 'bigdecimal'
a = BigDecimal(8.98846567431158e+307, precision)
b = BigDecimal(8.988465674311579e+307, precision)
mean = (a + b) / 2
puts "Works" if mean > [a, b].min && mean < [a, b].max
=> Works
(edit: I used 0 first and it works fine but converts to precision 15 instead of 16 (which is the maximum on my computer) for some reason, I may file a bug for this)
Works for small floats too:
smalls = [1.390671161567e-309, 1.390671161567e-309, 1.390671161567e-309].map { BigDecimal(@1, precision) }
mean = smalls.sum / smalls.size
puts "Works" if mean >= smalls.min && mean <= smalls.max
=> Works
One could also use BigDecimal("8.98846567431158e+307") but seems like the numbers are coming as floats.
So what's the correct code? I try my guess before reading the 30 page paper (I will add my comment after reading it)
My first guess: Normalize it. Find the largest exponent and divide all values with the exponent. This is a power-of-2 operation so would not lose any precision. Then divide all numbers by the length. Perform the same normalization again. Sum the numbers, then revert the exponent that was removed in the first step.
My second guess (backup plan): Use the interval arithmetic. I don't remember the specifics but I remember this is the safest form of floating point operations. But perhaps I should be careful about the performance penalty.
Who has actually run into this in practice? Though I’m sure a few people have I doubt it’s more than .001% of programmers or programs. And those that do run into it are probably aware of the challenges.
Calculating (a+b)/2 isn't even entirely trivial for integers - the trivial implementation overflows on large numbers, as one may find out when doing a binary search over a 4GB array.
Also, their overflow avoidance technique fails on integers too, and for basically the same reason (lack of precision).
Suppose you want to find the mean of (1, 1, 1). If you divide by the length of the list first, you're going to compute sum(1/3, 1/3, 1/3), which reduces to sum(0, 0, 0). And 0 < min(1, 1, 1).
Almost always they ignore this kind of issue; the best you're likely to get is a mean() function that remembers to sort the input first. Most numbers are in a "human" range far from the limits.
If you when calculating an average actually reach overflow in a double without messing up the precision first and making the calculation worthless in the first place, some numbers in the list of numbers is bogusly big anyway.
Data science is squishy to begin with -- those wanting high performance are heading to fixed-point hardware-accelerated solutions where loss-of-precision is a given, so not having a fully accurate answer with high-precision floating-point along many steps to solving the problem doesn't seem like a big deal.
Aren't means recursive? It's just sum/count. So you just split the list into arbitrary sublists, such that the sum of each sublist is well under the infinity cutoff. Calculate the mean of each sublist, and then the mean of those means.
And you could add the split logic to calculating the mean of the means. Just as insurance.
But by then you are not comparing apples to apples. Those are Java BigFloats, and by that reasoning Java and any other language that has access to bigfloats also would do the correct thing.
LolWolf|7 years ago
(a) You're doing something wrong (usually, not standardizing your data, etc.) which means that you're not thinking about the fact that computers have finite numerical precision and adjusting your problem accordingly (e.g., have a wide dynamic range of numbers).
Or, (b) your problem is pretty ill-conditioned and there's probably no solving it in any useful way unless you crank up the precision.
Almost every problem I've encountered of this form (whenever I've worked on optimizers, or, more generally, solving problems) is of type (a). There aren't many more choices if your problem is of type (b), since you're almost always losing digits somewhere in any implementation of a numerical algorithm.
The mean of two numbers is a particularly easy example to analyze, but I don't think that the implementation is "wrong" since every (even slightly more complicated) numerical algorithm will run into the same problems. I think this is a good post in its content, I just don't agree with the author's final interpretation.
rm999|7 years ago
You're right that there are issues with precision and floating point math (the article touches on this a little), but the implementation is definitely wrong if it promises to work like in the case of Python's statistics library (which was correctly using the fractions library to store arbitrary numbers, but converted to a fraction in the wrong order).
jmole|7 years ago
earlbellinger|7 years ago
"...incorrectly posed problems were considered ‘taboo’ for generations of mathematicians, until comparatively recently it became clear that there are a number of quite meaningful problems, the so-called ‘inverse problems,’ which are nearly always unstable with respect to fluctuations of input data." — H. Allison, Inverse Unstable Problems and Some of Their Applications (1979)
alexchamberlain|7 years ago
wglb|7 years ago
In addition to the largest numbers that floating point can handle, a related issue can come up with numbers that are not near the edge.
A good student of floating point numbers will notice that there is a good chance you will get a different result summing a list of floating point numbers if they are sorted in a different order. This is due to the rounding error that can happen if you add a large number to a smaller one. Thus, if you add up all the small numbers first, you can get an intermediate sum that is large enough to not overflow when added to the next on the list. Thus, you should sort an array of floating point numbers and begin the summation with the smallest.
Some careful thought is needed if numbers are important to your computation. While the author of this article makes important points about one aspect of this, they seem to shrug off further study of the referenced 30 page paper at https://hal.archives-ouvertes.fr/file/index/docid/576641/fil....
The authors of that paper, however, seems to misstate one vital point. They note that the mathematical formulas are "unstable", but to the point of working with floating point numbers, it isn't the mathematics that is unstable, it is the failure to understand how deeply modern floating point numbers are an approximation. Methods that fail to take this into account will be unstable.
tntn|7 years ago
Is this method more accurate than Kahan summation? And if so, is it worth the extra cost of sorting?
ball_of_lint|7 years ago
hodgesrm|7 years ago
snovv_crash|7 years ago
I suspect something similar could be done here, even if it would be a few times slower, to get an ulp-accurate mean regardless of the order/size of the individual elements.
1: https://en.wikipedia.org/wiki/Kahan_summation_algorithm
kazinator|7 years ago
Almost any non-trivial calculation that overflows can be massaged into something that avoids overflow. One important motivation behind floating point is to have enough practical range so that you don't run into overflow.
If this is an actual practical problem someone faces, the tool to reach for is wider floating-point, not to mess with the code. A bigger "E" than +307 is a thing.
Also, multi-precision floating-point is a thing.
neilv|7 years ago
wglb|7 years ago
celrod|7 years ago
He shows off a function named "sumsto" which takes a single positive double precision floating point number "x" as an argument.
"sumsto" always returns the same vector of 2046 floating point numbers -- just in an order so that naive left-to-right summation returns "x". Just changing the order of that vector lets you sum to almost any positive double precision number.
If you want to run the code, I didn't see where "realmax" was defined, but this works:
realmax() = prevfloat(typemax(Float64))
StefanKarpinski|7 years ago
joe_the_user|7 years ago
I have a hunch there could be a bit of relation between these things.
[1] https://en.wikipedia.org/wiki/Conditional_convergence
nestorD|7 years ago
For that you can :
- sort the numbers (clearly not the most cost effective solution but easy),
- use a high precision accumulator (which can give you an exact result if you go far enough),
- use a compensated summation (such as Kahan summation but the state of the art can do much better)
For extreme cases there are algorithms that can give you an exact result, my current favorite being Accurate Floating-Point Summation Part I: Faithful Rounding : https://www.researchgate.net/publication/228664006_Accurate_...
_bxg1|7 years ago
I guess this still assumes that the largest number in the original list is less than or equal to the maximum floating point value, but otherwise you stay roughly in the same space of precision as the original data.
mannykannot|7 years ago
https://en.wikipedia.org/wiki/Pairwise_summation
rwz|7 years ago
svenhof|7 years ago
umdiff|7 years ago
Ragib_Zaman|7 years ago
srean|7 years ago
TAForObvReasons|7 years ago
xurukefi|7 years ago
ken|7 years ago
andrewprock|7 years ago
If you are not aware of how floating point value work, this is an excellent illustration of that, and you should spend some time learning about the risks and edge cases associated with working with them.
At the same time, enabling floating point exceptions on your platform is probably the most pragmatic way of dealing with these issues. until you are in a domain where this level of effort is an investment, over-engineering modules that use floating points is a waste of time.
DannyBee|7 years ago
pjungwir|7 years ago
[0] https://github.com/pjungwir/aggs_for_arrays [1] https://github.com/pjungwir/aggs_for_arrays/blob/master/arra...
vortico|7 years ago
And if that isn't enough, you have subnormal numbers as a safety net as well, although using these kills performance on many FPUs.
fenwick67|7 years ago
acqq|7 years ago
You should not use SSE for that however, but the old x87 "FPU" instructions.
acqq|7 years ago
lelf|7 years ago
using known stable numeric methods? Kahan-Babuška summation?
aboutruby|7 years ago
Works for small floats too:
One could also use BigDecimal("8.98846567431158e+307") but seems like the numbers are coming as floats.guicho271828|7 years ago
My first guess: Normalize it. Find the largest exponent and divide all values with the exponent. This is a power-of-2 operation so would not lose any precision. Then divide all numbers by the length. Perform the same normalization again. Sum the numbers, then revert the exponent that was removed in the first step.
My second guess (backup plan): Use the interval arithmetic. I don't remember the specifics but I remember this is the safest form of floating point operations. But perhaps I should be careful about the performance penalty.
source99|7 years ago
praptak|7 years ago
adrianmonk|7 years ago
Suppose you want to find the mean of (1, 1, 1). If you divide by the length of the list first, you're going to compute sum(1/3, 1/3, 1/3), which reduces to sum(0, 0, 0). And 0 < min(1, 1, 1).
vortico|7 years ago
tomrod|7 years ago
How do REPLs and databases handle this edge case?
pjc50|7 years ago
rightbyte|7 years ago
nwatson|7 years ago
dsr_|7 years ago
glitchc|7 years ago
lazyjones|7 years ago
Python has Decimal for those who need correctness under all circumstances.
mirimir|7 years ago
And you could add the split logic to calculating the mean of the means. Just as insurance.
So would that work?
ams6110|7 years ago
OK but then the first example is a naive computation of the mean of two enormous numbers.
All this really illustrates is that you need to be aware of the range of your inputs and choose a solution that accomodates that.
wellpast|7 years ago
bjoli|7 years ago
ulam2|7 years ago
thirdreplicator|7 years ago
platz|7 years ago
siilats|7 years ago
rthomas6|7 years ago
Kpourdeilami|7 years ago
The mean is 2.5 if you sum them all and divide by 4.
With that method it is:
mean([1, 2, 3, 4]) = mean([1.5, 3, 4]) = mean([2.25, 4]) = 3.125
CDRdude|7 years ago
anonytrary|7 years ago
X = Mean([a,b,c]) = (a+b+c)/3
Y = YourFunction([a,b,c]) = (((a+b)/2)+c)/2 = (a+b+2c)/4
X does not equal Y. It doesn't matter if the reduce is left or right, btw, it's still wrong.
unknown|7 years ago
[deleted]
tomrod|7 years ago
quickthrower2|7 years ago