When Double Precision Is Not Enough

[+] vlmutolo|4 years ago|reply

An interesting tool to help preserve floating point precision is Herbie (https://herbie.uwplse.org/), which takes an expression as input and outputs an optimized version of it that will retain precision.

The example they give on their site is:

    sqrt(x+1) - sqrt(x)  =>  1/(sqrt(x+1) + sqrt(x))

[+] aSanchezStern|4 years ago|reply

For those curious, this tool uses arbitrary precision floating-point arithmetic (up to 3000 bits) to sample the input space and measure the error of different implementations of the function, then some clever numerical program synthesis to generate candidates and search for the most accurate implementation.

[+] gruez|4 years ago|reply

I'm glad that there's a tool to automatically do this. When I was in university, this was something that was on tests/assignments.

[+] klysm|4 years ago|reply

This tool is a really cool mix of hard math and soft reality. You need some pretty advanced tools to symbolically manipulate expressions like this, and then it takes the results and throws it on the wall to see what sticks best.

[+] unknown|4 years ago|reply

[deleted]

[+] dekhn|4 years ago|reply

does it generate erfc(x) from 1-erf(x)?

[+] jandrewrogers|4 years ago|reply

There are a few edge cases in geospatial computational geometry that require greater than quad precision to correctly compute a double precision result. This fact is sometimes used as a litmus test for implementation correctness. Many implementations don't have a conditional escape for these edge cases that switches to arbitrary precision algorithms in their double precision algorithms, which is easy to verify.

[+] jacobolus|4 years ago|reply

“Robustness” of results is a serious problem for any kind of computational geometry, because computational geometry algorithms generally only work when various assumed invariants (based on ideal arithmetic) actually hold, but those can easily be violated by arithmetic with rounding. When invariants are violated, the result can be an infinite loop, a program crash, or a comically incorrect result,

https://www.cs.cmu.edu/~quake/robust.html is some famous work from the mid 1990s showing how to achieve provably correct results in a computationally efficient way (on a CPU).

[+] kevinventullo|4 years ago|reply

Conversely, if you use double-precision floating point arithmetic near “null island”, the lat/lon coordinate system becomes more precise than the Planck length.

[+] nullc|4 years ago|reply

There is a flipside to this: e.g. GLIBC will switch to an arbitrary precision internal implementation for some operations that would otherwise lose a little precision and as a result run 1000x slower on some inputs.

Not a good thing when you're attempting to do realtime signal processing.

[+] dekhn|4 years ago|reply

I once wrote a "solar system simulator" in blender to generate images of solar eclipses from specific locations on earth. I modelled everything using meters, so literally placing the sun at the origin, and the earth at 1AU, and the moon near it. Turns out, blender uses single precision and the distance between earth and moon is incompatible with sun and earth, so all my renders were wrong. I divided all the sizes by 10 and it worked perfectly.

As a counter argument, in my field of MD, it was eventually shown that most force field terms can be computed in single precision, with the exception of the virial expansion (https://en.wikipedia.org/wiki/Virial_stress) which has to be computed in DP.

[+] s2mcallis|4 years ago|reply

I'd be interested in hearing if you have more info on this.

[+] vHMtsdf|4 years ago|reply

I remember a class I had where we were shown a methods for exact PC arithmetic with integers, rationals (e.g., 1/2) and algebraic numbers (e.g., sqrt(2)). Only thing it couldn't deal with were transcendental numbers (e.g., pi)

I think it worked by representing the numbers by integer matrices, all operations were then matrix operations. Unfortunately, the matrices were gaining dimension when new algebraic numbers got involved, so it probably wasn't useful for anything :-).

Anyway, it it blew me away anyway at the time, one of the few things that sticked with me from school.

[+] aSanchezStern|4 years ago|reply

Yeah there are a few number systems that can do this kind of thing, ranging from rational implementations where the numerator and denominator are stored, all the way to the computable reals (https://en.wikipedia.org/wiki/Computable_number), which can handle even trancendentals exactly! Fun fact, the built-in calculator on android actually uses the computable reals (https://dl.acm.org/doi/abs/10.1145/3385412.3386037), so you can do any calculation on it, and then ask for any number of digits (keep scrolling the result), and it will always be correct. The only downside is that they're a little too slow to use in a lot of performance-sensitive numerical applications.

[+] willis936|4 years ago|reply

If anyone ever runs out of precision using IIR filters: convert to second-order section (SOS). I stumbled upon this trick a few years ago and it's a game changer. No need for FIR.

[+] hcrisp|4 years ago|reply

I found the same. I once computed a high-order (something like 100) low-pass Butterworth using SOS sections as a test and did not notice any instability in the filtered signal. The key was to directly derive the SOS coefficients using bilinear transformation.

[+] mochomocha|4 years ago|reply

Yep. The other advantages are that now you can parallelize compute, and you get lower latency.

[+] mizzao|4 years ago|reply

Came here expecting an article about Kahan summation, left having learned about a multiple-precision Python library...

[+] dekhn|4 years ago|reply

IIUC most people instead precondition their matrices and stick to double precision instead of quad or deep precision.

[+] boulos|4 years ago|reply

Yeah, they also allude to "better ways to do it" both in the main text and the last paragraph:

> The next post will discuss better ways to implement higher precision in numerical calculations.

There are lots of automatic row / column swap techniques, particularly for sparse matrices that ensure you avoid catastrophic cancellation. You wouldn't do this automatically, because most matrices are reasonably conditioned (and figuring out the optimal swaps is NP complete and so on), but there are lots of algorithms to do well enough when you need to.

[+] magicalhippo|4 years ago|reply

This reminded me of Posits[1] and Unums[2]. I dabbled with my own implementation some years ago, but nothing serious.

Anyone tried them out? I see there's some FPGA implementations and a RISC-V extension idea[3], would be fun to try it on a softcore.

[1]: https://posithub.org/docs/posit_standard.pdf

[2]: http://johngustafson.net/unums.html

[3]: https://www.posithub.org/khub_doc

[+] Dylan16807|4 years ago|reply

Which version?

With current posits, they're a clever way to give more bits to precision near 1 and more bits to exponent near 0 and infinity, but if you're not overflowing or underflowing then it's a marginal improvement at best. Then Unum glues intervals on top, right? But you could do intervals with normal floats too. And you still won't get the correct answer, you'll just know the error exploded. Is the giant accumulator feature able to help here?

For the other versions I tend to get completely lost.

[+] aSanchezStern|4 years ago|reply

Unfortunately while Posits (and Unums) are more accurate for some computations, they are still designed around using small numbers of bits to represent numbers, so they have error in computations just like floating point. The argument for posits are mostly that they produce less error on the inputs you care about, and more on the ones you don't, but it really depends on your computation; you really have to know what you're doing to use them safely.

[+] frozenport|4 years ago|reply

>> Numerically stable algorithms exist for such problems, but these occasionally fail leaving us to brute force the calculation with higher precision to minimize floating point rounding errors.

Do algorithm authors really implement two code paths?

One that uses a cost function iterative algorithm to solve the matrix (this is what everybody does), and a second algorithm using the brute force approach the author showed (known not to work).

[+] irchans|4 years ago|reply

>> Do algorithm authors really implement two code paths?

I do sometimes. Sometimes I test the result, and if it is not correct, then I use a slower, more numerically stable algorithm. I'm not sure if I have ever written code that repeats the same code with quad precision, but of course that would often work. Sometimes you can compute a correction from the "residual".

>>One that uses a cost function iterative algorithm to solve the matrix (this is what everybody does), and a second algorithm using the brute force approach the author showed (known not to work).

We do often use Jacobi or Gauss-Seidel, or something similar (esp for computing estimates of A^(-1)b ). In those cases we never revert back to QR. We do sometimes use Jacobi or GS to improve a result obtained from an ill conditioned QR, but rarely.

Many years ago, I would have the code raise an exception if the result is bad so that I would get a call. When I got such a call, I would think about how to make the algorithm more stable.

[+] kkylin|4 years ago|reply

First thought: the curve in that first plot reflects the pseudospectrum? (https://en.wikipedia.org/wiki/Pseudospectrum) Spectra of nonnormal matrices can be very hard to calculate accurately.

[+] peter303|4 years ago|reply

The are algorithms taught in numerical analysis class for re-ordering calculations to maximize precision. Computing in order you wrote on paper or the way you were initially taught may not be the best.

[+] bluetwo|4 years ago|reply

I'll admit being lost in some of the math here but I wonder perhaps naively, if a language like Julia, which does fractions as fractions, would handle it better.

[+] salty_biscuits|4 years ago|reply

It's really just a floating point precision issue. Even if you represented the input matrix as a rational elementary type the output eigenvalues would be a subset of the reals and calculated as being floating point. You could get around it if you had a special matrix type where you knew you were definitely going to get rational eigenvalues and you could dispatch on that type.

[+] midjji|4 years ago|reply

Thankfully someone figured out that quad should be in C++ a while back, unfortunately they didnt figure out that the next few dozen should be too... the arbitrary precision stuff is an alternative in theory, and would be if any of the hundreds of half completed projects to convert unintelligeble, and unmaintainable, unclear, etc, C to the in this specific case extremely much better C++ interfaces was complete.

[+] layer8|4 years ago|reply

There is a whole field about that, called numerical analysis: https://en.m.wikipedia.org/wiki/Numerical_analysis

[+] charlieyu1|4 years ago|reply

Sounds like bad algorithm.

[+] ogogmad|4 years ago|reply

Obligatory reference to computable analysis: https://en.wikipedia.org/wiki/Computable_analysis

This shows that you can do exact real arithmetic without worrying about round-off errors. But it's expensive.

If you want something more practical (but dodgy), you can use arbitrary-precision arithmetic instead of exact real arithmetic. To that end, there's the Decimal module in the Python standard lib. There's also the BC programming language which is part of POSIX. If you want to do matrix arithmetic, you can use mpmath for Python like they did in this blog post, or Eigen3 for C++ with an arbitrary-precision type. This is dodgy of course because the arithmetic remains inexact, albeit with smaller rounding errors than double-precision floats.

[+] a-dub|4 years ago|reply

also worth noting that you can ask the fpu to trap to catch some types of precision loss or fp underflows.

[+] bayindirh|4 years ago|reply

If you're playing around subnormal numbers [0], most modern processors automatically switch to a microcode based calculation mode to preserve these numbers. However, these algorithms are very expensive, so it's not desired unless explicitly needed and generally used as an escape hatch for edge cases.

[0]: https://en.wikipedia.org/wiki/Subnormal_number

[+] akimball|4 years ago|reply

Rational arithmetic needs better matrix libraries. No loss of precision then.

[+] klodolph|4 years ago|reply

People have got to stop suggesting this. It doesn't make any sense and it's usually a waste of time.

First of all, the example in the article is computing eigenvalues. Let me ask you: how do you expect to calculate eigenvalues using rational numbers? As a refresher, if it's been years since you took a look at linear algebra, the eigenvalues of a matrix are the roots of the matrix's characteristic polynomial, and we remember from high-school algebra that the roots of a polynomial are often not rational numbers, even if the coefficients are rational. Therefore, using rational arithmetic here is a complete non-starter. It's just not even possible, right from the very start.

However, let's suppose that the particular problem that we are interested in solving does admit a rational solution. What then? Well, our real-world experience tells us that we often need an excessive amount computational resources and memory for exact, rational answers to linear algebra problems even of modest size.

As you perform more computations on rational numbers, the memory required continues to grow.

[+] kadoban|4 years ago|reply

True, but then they run arbitrarily slowly, and it's trivial to reach cases where this actually happens.

And anything without a rational result you're stuck again.

[+] mistrial9|4 years ago|reply

use 80 bits? https://en.wikipedia.org/wiki/Standard_Apple_Numerics_Enviro...

[+] aSanchezStern|4 years ago|reply

This is actually fewer bits than they're using in the article, but that's not clear because they refer to their numbers by how many "[decimal] digits", not how many bits. The hardware doubles they're starting with are 64-bits wide, representing 15.5 decimal digits, but then they move into 128- and 256-bit representations. 80-bit math, such as that available on the old x87 floating point co-processors, is I believe about 21 decimal digits.

[+] liversage|4 years ago|reply

Or 128 bits. FORTRAN has a REAL*16 type (quadruple precision floating point number). I encountered this at university in a physics department that studied non- linear systems also known as chaos theory. Rounding errors can quickly become a problem in these systems.

[+] ip26|4 years ago|reply

I believe x87 still offers 80 bits in your PC today.

80 comments