top | item 2506918

Why floats suck - accurate galaxy-wide coordinates in 128 bits

205 points| peterhajas | 15 years ago |facepunch.com | reply

83 comments

order
[+] mturk|15 years ago|reply
I am an astrophysicist, and I deal with this issue directly. In my simulations, I do deeply nested Adaptive Mesh Refinement calculations. These simulations run inside an Eulerian grid, where each cell contains a fluid element. These cells are collected into grid patches. When a cell inside a given grid exceeds some criteria, we refine it by inserting 8 cells; the next level of cells are then collected and minimum bounding boxes are created that cover the flagged cells, and we compute now at an additional 'level' of refinement.

My research concerns the formation of the first stars in the universe. In order to set up the initial conditions of these objects correctly, we have to resolve fluctuations in the background density of the Universe on the scale of kiloparsecs. However, we are also mostly concerned about how the fluid behaves on much smaller scales, at the tens to hundreds to thousands of AU scale, all the way down to the radius of the sun or even lower.

In one of my "stunt" runs, I ran up to 42 levels of refinement inside a box; this means that the dynamic range between the coarsest cells and the finest cells was 2^42. For the most part, however, we only use about 30-33 levels of refinement. In order to attain this resolution, we utilize extended precision positioning. (Future versions of the code will utilize integer positioning with relative offsets for Lagrangian elements like particles.) Inserting this functionality into the code took quite a bit of debugging and testing, in particular because we had to account for differences in how Intel, PGI and GNU passed extended precision values back and forth between Fortran and C.

I have uploaded a very simple 2D zoomin movie of one of these simulations, which only had 30 levels of refinement, to Vimeo. It's busy converting as I write this, but it will be found here when it has completed: http://vimeo.com/23176123

[+] lutorm|15 years ago|reply
Hey Matt! ;-)

I was thinking of another instance where this is noticeable: In a certain moving-mesh hydrodynamics code, the author was forced to use integers instead of doubles to perform intersection tests, because the "uneven" precision of the floating points can give the wrong answer to which side of a face a point is on, which leads to errors in the Voronoi tessellation.

[+] ubasu|15 years ago|reply
the dynamic range between the coarsest cells and the finest cells was 2^42

How do you deal with the discretization errors that come up because of this range? Doesn't that overshadow the errors you get from precision issues? I am just curious - not trying to challenge you.

[+] hartror|15 years ago|reply
I chat with PHD students at the Swinburne University Centre for Astrophysics and Supercomputing from tiem to time and they have many similar stories to tell. There are pretty sexy problems to deal with as a programmer enough to have me considered abandoning industry and joining the ranks of the professional student.
[+] gallamine|15 years ago|reply
What did you use to create that visualization? I'm running a Monte Carlo simulation of > 1 billion photons propagating through water, and I'd like to visualize it better.
[+] stcredzero|15 years ago|reply
I am an astrophysicist, and I deal with this issue directly. In my simulations, I do deeply nested Adaptive Mesh Refinement calculations. ...

...In one of my "stunt" runs, I ran up to 42 levels of refinement inside a box; this means that the dynamic range between the coarsest cells and the finest cells was 2^42.

Astrophysicist? Simulations? 42? Is this Slartibartfast?

[+] ChuckMcM|15 years ago|reply
This is an interesting article and worth a read. Then head over to Mike Cowlishaw's rant on decimal [1]. I met Mike at a conference and he's a brilliant guy, prior to seeing his decimal stuff I was using REXX on an Amiga (what Tcl might have been) which was language he invented while at IBM.

He got interested in decimal math early on but not because 'floats' such by binary floats really suck. That is a number system where .1 can't be accurately represented. How often do you use .1 in your calculations? He has a couple of examples of 34 digit precision which is representable in a 128 bit quantity. (3.7 bits per digit). Straight BCD would of course be 32 digits minus the digits which represent the exponent.

I wrote some CAD routines once that used decimal arithmetic rather than binary and was amazed at how much better it worked in terms of edges lining up and when you approximate non-square surfaces you get much better tesselation.

And of course for history buffs the IBM 1620 used decimal as a native representation for numbers. Now that we've got lots of transistors to play with, perhaps its a good time to revisit the way we represent numbers.

[1] http://speleotrove.com/decimal/

[+] jules|15 years ago|reply
That seems downright stupid.

"How often do you use 1/3 in your calculations. Base 10 is a number system where 1/3 can't be accurately represented."

If you need exactness, use rationals. Don't just use a number system with more prime divisors. If you don't need exactness, use base 2.

[+] arundelo|15 years ago|reply
I'd like hardware that represents fractional parts in binary-coded base-15 -- no rounding on 1/3 as well as 1/10!
[+] palish|15 years ago|reply
"I wrote some CAD routines once that used decimal arithmetic rather than binary and was amazed at how much better it worked in terms of edges lining up and when you approximate non-square surfaces you get much better tesselation."

That's precisely what I'm working on. What did you use?

P.S. Use 0.125, not 0.100

[+] bhickey|15 years ago|reply
Floats are great, you just need to understand when to use them!

If you want to use them for an application where you're doing frequent multiplications or divisions, I strongly suggest log transforming your domain and replacing your multiplies with adds and divisions with subtracts. This goes a long way to prevent numeric underflow when performing Viterbi parses without resorting to some godawful slow non-native numeric representation.

Haskell implements this as LogFloat (http://hackage.haskell.org/packages/archive/logfloat/0.8.2/d...), and I've cloned it for C++ as LogReal (http://www.bhickey.net/2011/03/logreal/ and http://github.com/bhickey/logreal)

[+] wladimir|15 years ago|reply
Well his point is that floats suck for coordinates, because they have a high precision at the origin, then become less and less precise further off. If you want to simulate a galaxy (his example) you want the same precision everywhere, and a uniform non-floating point representation is better.
[+] jacquesgt|15 years ago|reply
Understanding the precision of floats is a small pain, but fixed point math is much more painful.

With fixed point, you have to decide how many fraction bits you're going to have. If you have more than zero fractions bits, multiplies and divdes end up looking like this (assuming 16 fraction bits): result = (a * b + 1 << 15) >> 16 result = (a / b) << 16; // rounds to zero is a is close to b result = ((a << 4) / b) << 20; // not much range or precision on the result Of course, if your variables can't all be represented with the same number of fraction bits, you need to keep track of that and adjust your shifts for each calculation. You also need to do shifts when you add two numbers with different numbers of fraction bits. This stuff is all taken care of for you by floating point numbers.

With fixed point, if you mess up a calculation and it overflows, you get completely whacky results (wrap around) which will cause massive bugs. With floating point, you lose a bit or two of accuracy.

There are definitely places where fixed-point is needed. If you're on a microcontroller without an FPU, you'll have to learn your way around floating point. If you absolutely can't deal with the loss of precision you get with big floating point numbers, you'll have to use fixed point and make sure you understand the range of parameters that can go into each calculation and prevent overflow.

If floats work for you, just use them.

[+] beza1e1|15 years ago|reply
As a programmer it's easier to insert a bug with floats than with integers, because IEEE 754 has various subtle stumbling blocks.

For example, remember to think about NaN and that x != x, if x is NaN. And did you know, that there are actually two kinds of NaN? Or did you know, that 1.0/0.0 == +infinity?

[+] palish|15 years ago|reply
You say "1.0/0.0 == +infinity" like it should be surprising, but it makes perfect intuitive sense to me.

You assert that it's "easier to insert a bug with floats", but your first example (NaN) is one which programmers shouldn't be worrying about. The only case you'd want to check for NaN is in debug code (an assert).

[+] dustingetz|15 years ago|reply
there's a better reason, per TomF's blog[1] linked below by JabavuAdams. --

"any time you do a subtract (or an add, implicitly), consider what happens when the two numbers are very close together, e.g. 1.0000011 - 1.0. The result of this is going to be roughly 0.0000011 of course, but the "roughly" is going to be pretty rough. In general you only get about six-and-a-bit decimal digits of precision from floats (2^23 is 8388608), so the problem is that 1.0000011 isn't very precise - it could be anywhere between 1.0000012 or 1.0000010. So the result of the subtraction is anywhere between 1.210^-6 and 1.010^-6. That's not very impressive precision, having the second digit be wrong! So you need to refactor your algorithms to fix this.

The most obvious place this happens in games is when you're storing world coodinates in standard float32s, and two objects get a decent way from the origin. The first thing you do in rendering is to subtract the camera's position from each object's position, and then send that all the way down the rendering pipeline. The rest all works fine, because everything is relative to the camera, it's that first subtraction that is the problem. For example, getting only six decimal digits of precision, if you're 10km from the origin (London is easily over 20km across), you'll only get about 1cm accuracy. Which doesn't sound that bad in a static screenshot, but as soon as things start moving, you can easily see this horrible jerkiness and quantisation."

[1] http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BA%...

[+] atakan_gurkan|15 years ago|reply
To make various computations, you generally need powers of a given quantity. It seems to me that when you take large powers for intermediate results, the precision in fixed point arithmetic will rapidly drop, no? Furthermore you need to put together various quantities with different scaling factors. IMHO, that would make things very error prone (or you can again sacrifice precision).

I am not aware of any physical problem that actually requires 128 bit floats. I can make up some, but those would be either chaotic (in which case precision does not help), or I would be telling you about a solution that looks like it can be improved. There are of course some problems for which we have no physical insight, then high precision provides a nice safety buffer.

[+] eru|15 years ago|reply
I agree in general.

Precision does help to some degree in chaotic circumstances. For example, when forecasting the weather, you know that it's chaotic. But the chaos takes a while to show itself.

[+] smlacy|15 years ago|reply
Floats are for math. Math includes computing things like 100! (100 factorial), the volume of the observable universe (~3.3* 10^80), or any other countably finite value, most of which are far greater than 2^128.

Fixed point is a special purpose encoding that is useful in many places (computer graphics and constrained simulations) when your range and precision are known and fixed. "Math" doesn't follow these rules.

Floats are good for math.

[+] JabavuAdams|15 years ago|reply
You have it exactly backwards. 100 factorial cannot be represented by a float (or double, or long double)

100! = 9332621544394415268169923885626670049071596826438162146859296389521759\ 9993229915608941463976156518286253697920827223758251185210916864000000\ 000000000000000000

This is 2^(524.765) > 2^32 or 2^64 or 2^80 or 2^128

Note that "floating point" is commonly used to refer to 32-bit or 64-bit or 80-bit representations.

Floats are for games (with caveats), and possibly engineering, while purer math needs arbitrary-precision representations that are too slow for games and many simulations.

[+] eru|15 years ago|reply
And I thought arbitrary precision integers and rational numbers were for mathematics.
[+] thisrod|15 years ago|reply
In the 40s, floating point numbers were the hot new thing in programing. John von Neumann argued against them, because anyone smart enough to use a computer could track the exponents in their head. Or so I've heard.

This is the classic trade of computer efficiency against human effort. The typical scientific program is run once, so it leaves as much work as possible to the machine. The efficient tradeoff might differ for games and such applications as AutoCAD or Splice.

[+] nikcub|15 years ago|reply
Great post. A question which doesn't have much to do with the main point - don't his calculations result in a plane with 2d co-ordinates rather than three dimensional?

ie. with our galaxy, wouldn't you have to include the thickness to get any co-ord:

(100 000 light years) * (12000 ly) / (2^128) = 3.336e^-17

[+] guard-of-terra|15 years ago|reply
How can he fit three-dimensional coordinates into a single 128-bit number? Seeing how he divides diameter, I assume he only took care of one axis.

I'm curious how many bits do we need to address a point in an universe with reasonable accuracy.

[+] CapitalistCartr|15 years ago|reply
We'd need three axes to define a coördinate in space, four to include time. Given his estimation of the known Universe as 9.3e10 light years across, giving each axis 2^96 would yield an accuracy of about 1.1 kilometers, plenty enough for space navigation. Multiply that by the number of axes.
[+] qntm|15 years ago|reply
The interesting thing is that, as you get further and further from Earth or the Sun or the centre of the Milky Way, the level of precision which is even useful becomes less and less. What is one light year left or right for a quasar 13,000,000,000 light years away? Floats, then, behave entirely appropriately in this situation.
[+] cmaggard|15 years ago|reply
He was indeed only specifying a single axis, just as you wouldn't use a single float to specify three axes.
[+] rorrr|15 years ago|reply
You specify the accuracy you need. Let's say, 1 millimeter.

(9.3e10 light years) / (0.001 meter) = 8.79 × 10^29

log2(8.79 × 10^29) = 99.471

So you need 100 bits to represent each coordinate at that precision.

[+] lutorm|15 years ago|reply
Interesting, but actually it's no longer the case that fixed point math is faster than floating point. Certainly GPUs are a lot faster at floats than ints.
[+] SomeCallMeTim|15 years ago|reply
Tell that to Android developers who are writing simulation apps that need to work on ARMv6 phones, like the LG Optimus series. No FPU there.
[+] ck2|15 years ago|reply
Sorry to be pedantic but since the universe is constantly expanding, aren't such coordinates already inaccurate in only a short while after they are created (unless they are highly localized which likely defeats their purpose in the first place)?
[+] temptemptemp13|15 years ago|reply
I see 2 possible solutions:

1. The coordinates expand with the universe.

2. The coordinates are static and the universe just expands. Note that this isn't awful, planets and stars are always in motion. When planning inter-galactic missions you're always going to have to know where's the starting point and to calculate where the ending point will be at the relevant time.

So to say where Earth is you're going to write down a 4D vector, specifying at what time this was measured.

Btw, the article didn't mention 3D space did it?

[+] ubasu|15 years ago|reply
This displays a fundamental lack of understanding of how floating point works. In particular, we are concerned about the accuracy of the floating point representation, which is characterized by the machine precision. Typically, single-precision has 8 digit accuracy, while double-precision has 16-digit accuracy.

Furthermore, we don't use 128-bit representations because they are slow for numerical computation, because we need more trips through the smaller-sized bus.

See http://en.wikipedia.org/wiki/Floating_point

[+] JabavuAdams|15 years ago|reply
No, this is a real problem.

Lack of precision of floating point is something that comes up often in game development, now that levels are larger.

If you have a level that is 1-2km across, and you store all your positions in floats, then you will notice that animations for instance are quantized.

Near the world origin, everything is fine, but far from the world origin individual vertices in the character and props will become quantized to the smallest representable delta. This results in atrocious geometric fizzing and popping.

[+] aidenn0|15 years ago|reply
This article understands how floating point works.

People using floating point often have a fundamental lack of understanding how floating point works. Floating point numbers are a leaky abstraction that exist only for performance optimization. Using floats as a default is an example of premature optimization.