Reciprocal Approximation with 1 Subtraction
99 points| mbitsnbites | 1 year ago
float fast_reciprocal(float x)
{
unsigned i = *(unsigned *) &x;
i = 0x7effffffU - i;
return *(float *) &i;
}
The magic number 0x7effffff accomplishes two things:1) The exponent is calculated as 253-e, which effectively negates the exponent and subtracts 1.
2) The mantissa is approximated as a 1st order polynomial in the interval [1, 2).
Interesting, but perhaps not very useful (as most CPU:s have more accurate reciprocal approximations these days).
jcmeyrignac|1 year ago
dahart|1 year ago
One can do better still - 0x7EF311BC is a near optimal value at least for inputs in the range of [0.001 .. 1000].
The simple explanation here is:
The post’s number 0x7EFFFFFF results in an approximation that is always equal to or greater than 1/x. The value 0x7EEEEBB3 is better, but it’s less than 1/x around 2/3rds of the time. My number 0x7EF311BC appears to be as well balanced as you can get, half the time greater and half the time less than 1/x.
To find this number, I have a Jupyter notebook that plots the maximum absolute value of relative error over a range of inputs, for a range of magic constants. Once it’s setup, it’s pretty easy to manually binary search and find the minimum. The plot of max error looks like a big “V”. (Edit while the plot of mean error looks like a big “U” near the minimum.
The optimal number does depend on the input range, and using a different range or allowing all finite floats will change where the optimal magic value is. The optimal magic number will also change if you add one or more Newton iterations, like in that github snippet (and also seen in the ‘quake trick’ code).
PPS maybe 0x7EF0F7D0 is a pretty good candidate for minimizing the average relative error…?
dahart|1 year ago
It might be nice to start sharing modern/safe versions of this snippet & the Quake thing. Is using memcpy the only option that is safe in both C and C++? That always felt really awkward to me.
[1] https://tttapa.github.io/Pages/Programming/Cpp/Practices/typ...
LegionMammal978|1 year ago
Apart from memcpy(), the 'allowed' methods include unions in C (writing to one member and reading from another), and bit_cast<T>() and std::start_lifetime_as<T>() in C++.
[0] https://godbolt.org/z/dxMMfazoq
Asooka|1 year ago
The big problem, apart from "memcpy is a bit tedious to write", is that it is impossible to guarantee absence of UB in a large program. You will get inlined functions from random headers and suddenly someone somewhere is "dereferencing" a pointer by storing it as a reference, and then your null pointer checks are deleted. Or an integer overflows and your "i>=0" array bounds assert is deleted. I have seen that happen several times and each time the various bits of code all look innocent enough until you sit down and thoroughly read each function and the functions it calls and the functions they call. So we compile with the worst UB turned off (null is a valid address, integers can overflow, pointers can alias, enums can store any integer) and honestly, we cannot see a measurable difference in performance.
UB as it is currently implemented is simply an enormous footgun. It would be a lot more useful if there were a profiler that could find parts of the code, which would benefit from UB optimisations. Then we could focus on making those parts UB-safe, add asserts for the debug/testing builds, and turn on more optimisations for them. I am quite certain nobody can write a fully UB-free program. For example, did you know that multiplying two "unsigned short" variables is UB? Can you guarantee that across all the template instantiations for custom vector types you've written you never multiply unsigned shorts? I'll leave it as an exercise to the reader as to why that is.
tekknolagi|1 year ago
Conscat|1 year ago
mbitsnbites|1 year ago
Some things that could mess with you:
* Floating-point endianity is not the same as integer endianity.
* Floating-point alignment requirements differ from integer alignment requirements.
* The compuler is configured to use something else than 32-bit binary32 IEEE 754 for the type "float".
* The computer does not use two's complement arithmetic for integers.
In practice, these are not real problems.
Earw0rm|1 year ago
You could use intrinsics for the bit casting & so on and it would be well-defined to the extent that those intrinsics are.
(I understand some people consider SIMD intrinsics in general to be UB at language level, in part because they let you switch between floats & ints in a hardware-specific way.)
pkhuong|1 year ago
Y_Y|1 year ago
AlotOfReading|1 year ago
I did this a few days ago in the approximate division thread: https://news.ycombinator.com/item?id=42481612#42489596
In some cases, it can be faster than hardware division.
dahart|1 year ago
magicalhippo|1 year ago
[1]: https://en.wikipedia.org/wiki/Fast_inverse_square_root
fph|1 year ago
More detail on https://en.wikipedia.org/wiki/Fast_inverse_square_root#Alias... .
IEEE754 floats are a very well-designed binary format, and the fact that these approximations are possible is part of this design; indeed, the first known instance of this trick is for a=1/2 by W. Kahan, the main designer of IEE754.
mbitsnbites|1 year ago
If you want full precision, you need to do three Newton-Raphson iterations after the initial approximation. One iteration is:
quanto|1 year ago
1. 7e: the first sig bit has to be zero to preserve the sign of the overall number.
2. ffffff: due to Taylor series 1/(1 + X) = 1 - X ..., negating gives the multiplicative inverse.
Although an IEEE float has a sign bit (thus 2's complement does not work on the float itself) but mantissa and the exponent individually work with 2's complement for different reasons. The exponent has a biased range due to being chopped by half; where as the mantissa has a biased range due to its definition and constant offset. The exponent's 1 offset (7e instead of 7f) is a bit more difficult to see at first -- the devil is in the details, but 2's complement is the basic intuition.
ncruces|1 year ago
It's in Go, but ports easily to other languages.
See also: https://stackoverflow.com/questions/32042673/optimized-low-a...
mbitsnbites|1 year ago
torusle|1 year ago
That was a thing back in the 90th..
I wonder how hard the performance hit from moving values between integer and float pipeline is nowadays.
Last time I looked into that was the Cortex-A8 (first I-Phone area). Doing that kind of trick costed around 26 cycles (back and forth) due to pipeline stalls back then.
stephencanon|1 year ago
ack_complete|1 year ago
mbitsnbites|1 year ago
somat|1 year ago
Because what do you really want? some sort of exponent or exponent math right, some variant of the log function should work. is the problem is all the log functions are gated behind the function call interface. where as the subtract function is less heavy being behind the operator interface. or are they trying to use a floating point accelerated log aka floating point subtract?
dahart|1 year ago
It’s not about interfaces. The point is that a single subtract is simpler math, and (probably) faster than a divide, and it gets you surprisingly close to the right answer. The reason it works is subtracting two exponents is a subtraction in log space which is equivalent to a divide in linear space.
People don’t use this trick in practice that much if at all, especially these days with GPUs that do reciprocals and square roots in hardware, it’s just an interesting thing to know about that helps solidify our understanding of floating point numbers and logarithms.
jasomill|1 year ago
[1] D.E. Knuth, The art of computer programming. Vol. 1, third edition, Addison-Wesley, Reading, MA, 1997.
[2] https://www.werkema.com/2021/10/14/my-knuth-check/
mbitsnbites|1 year ago
brudgers|1 year ago
But then again, representing irrational numbers in binary is inherently janky.
kkkqkqkqkqlqlql|1 year ago