Maybe I am too mathematically enclined, but this was not easy to understand.
The ELI5 explanation of floating point: they approximately give you the same accuracy (in terms of bits) independently of the scale. Whether your number if much below 1, around 1, or much above 1, you can expect to have as much precision in the leading bits.
This is the key property, but internalizing it is difficult.
One problem I was struggling with: What's the shortest, unambiguous decimal representation of a float?
For example, if you use single precision floats, then you need up to 9 digits of decimal precision to uniquely identify a float. So you would need to use a printf pattern like %.9g to print it. But then 0.1 would be output as 0.100000001, which is ugly. So a common approach is to round to 6 decimal digits: If you use %.6g, you are guaranteed that any decimal input up to 6 significant digits will be printed just like you stored it.
But you would no longer be round-trip safe when the number is the result of a calculation. This is important when you do exact comparisons between floats (eg. to check if data has changed).
So one idea I had was to try printing the float with 6 digits, then scanning it and seeing if it resulted in the same binary representation. If not, try using 7 digits, and so on, up to 9 digits. Then I would have the shortest decimal representation of a float.
This is my algorithm:
int out_length;
char buffer[32];
for (int prec = 6; prec<=9; prec++) {
out_length = sprintf(buffer, "%.*g", prec, floatValue);
if (prec == 9) {
break;
}
float checked_number;
sscanf(buffer, "%g", &checked_number);
if (checked_number == floatValue) {
break;
}
}
I wonder if there is a more efficient way to determine that shortest representation rather than running printf/scanf in a loop?
Your problem is practically important as it can be considered as the "canonical" string representation of given floating point number (after adding the closestness requirement). There are many efficient algorithms as a result, selected algorithms include Dragon4, Grisu3, Ryu and Dragonbox. See also Google's double-conversion library which implements first two of them.
Don't worry about negative comments, yes, this is not the best way, but (if there's no error, I didn't analyze it thoroughly) it's often good enough; if it works, then it works.
I once wanted to find a vector for which Euler rotation (5°, 5°, 0) will result with the same vector, so I just ran a loop of million iterations or so which would take a starting vector, translate it randomly slightly (add a small random vector) and see if after rotation it's closer to original than the previous vector would be after rotation, if not, discard the change otherwise keep it. The script ran for a couple seconds on Python and with decreasing translation vector based on iteration number, I got perfect result (based on limited float precision of the vector). 2s would be terribly slow in a library but completely satisfying for my needs :D
My favorite FP Fun Fact is that float comparisons can (almost) use integer comparisons. To determine if a > b, reinterpret a and b as signed ints and just compare those like any old ints. It (almost) works!
The implication is that the next biggest float is (almost) always what you get when you reinterpret its bits as an integer, and add one. For example, start with the zero float: all bits zero. Add one using integer arithmetic. In int-speak it's just one; in float-speak it's a tiny-mantissa denormal. But that's the next float; and `nextafter` is implemented using integer arithmetic.
Learning that floats are ordered according to integer comparisons makes it feel way more natural. But of course there's the usual asterisks: this fails with NaNs, infinities, and negative zero. We get a few nice things, but only a few.
This isn't accurate. It's true for positive numbers, and when comparing a positive to a negative, but false for comparisons between negative numbers. Standard floating point uses sign-magnitude representation, while signed integers these days use 2s-complement. On negative numbers, comparisons are reversed between these two encodings. Incrementing a float as if it were an integer will, in ordinary circumstances, get you the next one larger in magnitude, but with the same sign -- i.e., you go up for positives but down for negatives. Whereas with signed integers, you always go up except when there's an overflow into the sign bit.
A more correct version of the statement would be that comparison is the same as on sign-magnitude integers. Of course, this still has the caveats you already mentioned.
For what it's worth, here's the Rust std implementation [1] of the total (as in, makes NaNs comparable) comparison algorithm given by IEE 751:
let mut left = self.to_bits() as i32;
let mut right = other.to_bits() as i32;
// In case of negatives, flip all the bits except the sign
// to achieve a similar layout as two's complement integers
left ^= (((left >> 31) as u32) >> 1) as i32;
right ^= (((right >> 31) as u32) >> 1) as i32;
left.cmp(&right)
This came during my OMSCS Game AI course as an example of the dangers of using floats to represent game object location. If you get further from the origin point (or another game element you are referencing from) you are losing precision as the float needs to use more of the significand to store the larger value.
I love how that became part of the "mythology" of Minecraft as the "Far Lands", where travelling far from the world origin made terrain generation and physics break down, subtly at first, and less so as you kept going.
It's like the paranormal trope of an expedition encountering things being disconcertingly "off" at first, and then eventually the laws of nature start breaking down as well. All because of float precision.
If you have a large set of lets say floats in the range between 0 and 1 and you add them up, there is the straightforward way to do it and there is a way to pair them all up, add the pairs, and repeat that process until you have the final result. You will see that this more elaborate scheme actually gets a way more accurate result vs. simply adding them up. This is huge, because there is a lot of processing done with floats where this inherent issue is entirely disregarded (and has an impact on the final result).
Donald Knuth has all of that covered in one of his "The Art of Computer Programming" books, with estimations about the error introduced, some basic facts like a + (b + c) != (a + b) + c with floats and similar things.
And believe it or not, there have been real world issues coming out of that corner. I remember Patriot missile systems requiring a restart because they did time accounting with floats and one part of the software didn't handle the corrections for it properly, resulting in the missiles going more and more off-target the longer the system was running. So they had to restart them every 24 hours or so to keep that within certain limits until the issue got fixed (and the systems updated). There have been massive constructions breaking apart due to float issues (like material thicknesses calculated too thin), etc.
Define boundary conditions -- how much precision do you need? Then you can compute the min/max distances. If the "world" needs to be larger, then prepare to divide it into sectors, and have separate global/local coordinates (e.g. No Man's Sky works this way).
Really though, games are theater tech, not science. Double-Precision will be more than enough for anything but the most exotic use-case.
The most important thing is just to remember not to add very-big and very-small numbers together.
Kerbal Space Program required a lot of clever engineering to cram an entire solar system into the constrains of 32 bit floats. There are articles and videos out there, highly recommended!
I was looking through the comments to see if someone already mentioned it. Great webpage!
However the one in OP has an amazing graph intuitively explaining the numeric space partitioning - the vertical axis is logarithmic, and the horizontal is linear for each row on its own, but rows are normalized to fit the range between the logarithmic values on the vertical axis. I guess it's obvious once you're comfortably understanding floats and could do with some explanations for those still learning it.
Far too superficial because it lacks explanation of non-normal conditions (denormals, zeroes, infinities, sNaNs, and qNaNs) and the complete mapping of values. This isn't properly educational.
"An sNaN (signaling Not-a-Number) is a special floating-point value that is designed to trigger a hardware trap or exception when it is used in an arithmetic operation. This differs from a qNaN (quiet Not-a-Number), which propagates through calculations without causing an immediate exception. Handling sNaNs requires a more deliberate approach that involves enabling floating-point traps and writing a custom trap handler."
> THE .EXPOSED TLD This TLD is attractive and useful to end-users as it better facilitates search, self-expression, information sharing and the provision of legitimate goods and services. Along with the other TLDs in the Donuts family, this TLD will provide Internet users with opportunities for online identities and expression that do not currently exist. In doing so, the TLD will introduce significant consumer choice and competition to the Internet namespace – the very purpose of ICANN’s new TLD program.
> .EXPOSED will be utilized by registrants seeking new avenues for expression on the Internet. There is a deep history of progressivity and societal advancement resulting from the online free expressions of criticism. Individuals and groups will register names in .EXPOSED when interested in editorializing, providing input, revealing new facts or views, interacting with other communities, and publishing commentary.
For 64-bit, 9007199254740991 is also known as `Number.MAX_SAFE_INTEGER` in JavaScript. Note that this value is not even; the next integer, ..0992 is still safe to represent by its own, but it can't be distinguished from a definitely unsafe integer ..0993 rounded to ..0992.
You mean ±9,007,199,254,740,993.0 for 64-bit floats. :-)
For other curious readers, these are one beyond the largest integer values that can be represented accurately. In other words, the next representable value away from zero after ±16,777,216.0 is ±16,777,218.0 in 32 bits -- the value ±16,777,217.0 cannot be represented, and will be rounded somewhat hand-wavingly (usually towards zero).
Precision rounding is one of those things that people often overlook.
It would be nice if the site had an explanation why the decimal 0.1 has no finite representation in base-2 system. In short it's about factorization of the base of the system - base-2 lacks 5 that is present in base-10. It is analogous to 1/3 or 1/7 not having a finite representation in base-10 dot notation of a fraction.
Good job on showing the hexadecimal version. I recently had a challenging C bug where I was using printf("%a") to show the underlying representation, which uses hexadecimal, and it was a little confusing at first. This site would have been helpful.
I'm glad this exists, except for the fact that IEEE754 is the devil, and posits are better (not assuming hardware support). (Even better than both are bignum rationals, but those are the slowest.)
"FatRat" is a hilarious name for a numeric type (and I mean this in a purely positive way). I've always had an impression of Perl as a language with a bit of a quirky, fun community, and it's nice to see that might have carried over to Raku as well.
Can't wait for someone to buy boolean.exposed and teach me about some esoteric representation of booleans in memory that I'd never considered (either that or it's a very simple page).
[+] [-] vismit2000|6 months ago|reply
[+] [-] cdavid|5 months ago|reply
The ELI5 explanation of floating point: they approximately give you the same accuracy (in terms of bits) independently of the scale. Whether your number if much below 1, around 1, or much above 1, you can expect to have as much precision in the leading bits.
This is the key property, but internalizing it is difficult.
[+] [-] larodi|5 months ago|reply
https://news.ycombinator.com/item?id=45200925
[+] [-] nesk_|5 months ago|reply
[+] [-] unknown|5 months ago|reply
[deleted]
[+] [-] jjcob|5 months ago|reply
For example, if you use single precision floats, then you need up to 9 digits of decimal precision to uniquely identify a float. So you would need to use a printf pattern like %.9g to print it. But then 0.1 would be output as 0.100000001, which is ugly. So a common approach is to round to 6 decimal digits: If you use %.6g, you are guaranteed that any decimal input up to 6 significant digits will be printed just like you stored it.
But you would no longer be round-trip safe when the number is the result of a calculation. This is important when you do exact comparisons between floats (eg. to check if data has changed).
So one idea I had was to try printing the float with 6 digits, then scanning it and seeing if it resulted in the same binary representation. If not, try using 7 digits, and so on, up to 9 digits. Then I would have the shortest decimal representation of a float.
This is my algorithm:
I wonder if there is a more efficient way to determine that shortest representation rather than running printf/scanf in a loop?[+] [-] lifthrasiir|5 months ago|reply
[+] [-] Etherlord87|5 months ago|reply
I once wanted to find a vector for which Euler rotation (5°, 5°, 0) will result with the same vector, so I just ran a loop of million iterations or so which would take a starting vector, translate it randomly slightly (add a small random vector) and see if after rotation it's closer to original than the previous vector would be after rotation, if not, discard the change otherwise keep it. The script ran for a couple seconds on Python and with decreasing translation vector based on iteration number, I got perfect result (based on limited float precision of the vector). 2s would be terribly slow in a library but completely satisfying for my needs :D
[+] [-] IshKebab|5 months ago|reply
Yes, just `printf("%f", ...);` will get you that.
The actual algorithms to do the float->string conversion are quite complicated. Here is a recent pretty good one: https://github.com/ulfjack/ryu
I think there's been an even more recent one that is even more efficient than Ryu but I don't remember the name.
[+] [-] ridiculous_fish|6 months ago|reply
The implication is that the next biggest float is (almost) always what you get when you reinterpret its bits as an integer, and add one. For example, start with the zero float: all bits zero. Add one using integer arithmetic. In int-speak it's just one; in float-speak it's a tiny-mantissa denormal. But that's the next float; and `nextafter` is implemented using integer arithmetic.
Learning that floats are ordered according to integer comparisons makes it feel way more natural. But of course there's the usual asterisks: this fails with NaNs, infinities, and negative zero. We get a few nice things, but only a few.
[+] [-] Sniffnoy|6 months ago|reply
A more correct version of the statement would be that comparison is the same as on sign-magnitude integers. Of course, this still has the caveats you already mentioned.
[+] [-] Sharlin|5 months ago|reply
[+] [-] splicer|5 months ago|reply
[+] [-] SomaticPirate|6 months ago|reply
[+] [-] efskap|5 months ago|reply
It's like the paranormal trope of an expedition encountering things being disconcertingly "off" at first, and then eventually the laws of nature start breaking down as well. All because of float precision.
[+] [-] pixelfarmer|5 months ago|reply
Donald Knuth has all of that covered in one of his "The Art of Computer Programming" books, with estimations about the error introduced, some basic facts like a + (b + c) != (a + b) + c with floats and similar things.
And believe it or not, there have been real world issues coming out of that corner. I remember Patriot missile systems requiring a restart because they did time accounting with floats and one part of the software didn't handle the corrections for it properly, resulting in the missiles going more and more off-target the longer the system was running. So they had to restart them every 24 hours or so to keep that within certain limits until the issue got fixed (and the systems updated). There have been massive constructions breaking apart due to float issues (like material thicknesses calculated too thin), etc.
[+] [-] mwkaufma|6 months ago|reply
Really though, games are theater tech, not science. Double-Precision will be more than enough for anything but the most exotic use-case.
The most important thing is just to remember not to add very-big and very-small numbers together.
[+] [-] danhau|5 months ago|reply
[+] [-] omoikane|6 months ago|reply
https://www.h-schmidt.net/FloatConverter/IEEE754.html
This one has the extra feature of showing the conversion error, but it doesn't support double precision.
[+] [-] Etherlord87|5 months ago|reply
However the one in OP has an amazing graph intuitively explaining the numeric space partitioning - the vertical axis is logarithmic, and the horizontal is linear for each row on its own, but rows are normalized to fit the range between the logarithmic values on the vertical axis. I guess it's obvious once you're comfortably understanding floats and could do with some explanations for those still learning it.
[+] [-] yuvadam|5 months ago|reply
These types of visualizations are super useful.
[1] - https://cidr.xyz
[+] [-] burnt-resistor|5 months ago|reply
[+] [-] ForOldHack|5 months ago|reply
Just learned something. Thanks.
[+] [-] slater|6 months ago|reply
[+] [-] shoo|6 months ago|reply
> .EXPOSED will be utilized by registrants seeking new avenues for expression on the Internet. There is a deep history of progressivity and societal advancement resulting from the online free expressions of criticism. Individuals and groups will register names in .EXPOSED when interested in editorializing, providing input, revealing new facts or views, interacting with other communities, and publishing commentary.
-- https://icannwiki.org/.exposed
[+] [-] maxbond|6 months ago|reply
[+] [-] CraigJPerry|5 months ago|reply
It's sometimes fun to have these kinds of edge cases up your sleeve when testing things.
[+] [-] lifthrasiir|5 months ago|reply
[+] [-] simonask|5 months ago|reply
For other curious readers, these are one beyond the largest integer values that can be represented accurately. In other words, the next representable value away from zero after ±16,777,216.0 is ±16,777,218.0 in 32 bits -- the value ±16,777,217.0 cannot be represented, and will be rounded somewhat hand-wavingly (usually towards zero).
Precision rounding is one of those things that people often overlook.
[+] [-] porker|5 months ago|reply
[+] [-] emergie|5 months ago|reply
[+] [-] llm_nerd|5 months ago|reply
[+] [-] rwmj|5 months ago|reply
[+] [-] unknown|5 months ago|reply
[deleted]
[+] [-] pmarreck|5 months ago|reply
[+] [-] the__alchemist|5 months ago|reply
[+] [-] librasteve|5 months ago|reply
https://raku.org defaults to Rationals (Rats) and provides FatRat for arbitrary precision
otherwise even relatively normal calculations (eg what’s the mass of electron in quectogram) fail
[+] [-] saghm|5 months ago|reply
[+] [-] noxa|5 months ago|reply
[+] [-] tempodox|5 months ago|reply
[+] [-] elvircrn|5 months ago|reply
[+] [-] lifthrasiir|5 months ago|reply
[+] [-] makeworld|5 months ago|reply
[+] [-] jcalx|5 months ago|reply
[+] [-] aarestad|5 months ago|reply