top | item 8434128

Intel Underestimates Error Bounds by 1.3 Quintillion

424 points| ghusbands | 11 years ago |randomascii.wordpress.com | reply

73 comments

order
[+] luu|11 years ago|reply
This is one of those things that's well known by people who spend a lot of time doing low-level nonsense that really ought to get wider press. You really don't want to use x86 hardware transcendental instructions.

I've seen people try to measure the effectiveness of their approximation by comparing against built-in hardware, but their approximation is probably better than the hardware! If you're not writing your own assembly, you'll be ok on most modern platforms if you just link with "-lm" and use whatever math library happens to be lying around as the default, but it's still possible to get caught by this. On obscure platforms, it's basically a coin flip.

I used to work for another CPU vendor, and when we implemented more accurate sin/cos functions, some benchmarks would fail us for getting the wrong result.

Turns out those benchmarks hardcode a check for a couple results, and that check is based on what Intel has been doing for ages. My recollection is that we had a switch to enable the extra accuracy, but that it wasn't enabled by default because it was too much of a risk to break compatibility with Intel.

If that sounds too risk averse, there's a lot of code out there that depends on your processor precisely matching Intel's behavior on undefined flags and other things that code has no business depending on. It's basically the same thing Raymond Chen is always talking about, but down one level of abstraction. I've seen what happens if you just implement something that matches the "spec" (the Intel manual) and do whatever you want for "undefined" cases. You get a chip that's basically useless because existing software will crash on it.

[+] userbinator|11 years ago|reply
> You really don't want to use x86 hardware transcendental instructions

...unless you're doing size-optimised graphics intros, where tremendous accuracy is not essential and the stack-based nature of x87 makes for some extremely compact code:

http://www.pouet.net/prod.php?which=53816

https://www.youtube.com/watch?v=R35UuntQQF8

I think the main reason why this issue hasn't been so significant is that the majority of code that uses those instructions isn't dependent on extreme accuracy - code that does would likely use their own routines and/or even arbitrary-precision arithmetic.

[+] JoeAltmaier|11 years ago|reply
The problem is not so much "Intel stuff is old and broken" as it is "Intel wrote the standard and there's no process to move it forward"?
[+] CurtHagenlocher|11 years ago|reply
Of possible interest, I researched a related issue once and here's what I found:

The particular approximation used by the FPU instructions fsin and fcos is valid in the domain [-π /4, π /4]. Inputs outside this domain are reduced by taking the remainder when dividing by π /4 and then using trigonometric identities to fix up values in the other three quadrants (ie sin(x) = sin(π /2 – x) for x in [π /4, π /2].

The Intel FPU is “off” because it uses an approximation for π /4 when performing the reduction. In the mid 90s, some engineers at AMD figured out how to do a more precise reduction and implemented this reduction in the K5 FPU. AMD went back to being bit-for-bit compatible with the x87 behavior in the very next generation of their CPUs, presumably because it either broke or appeared to break too many applications.

[+] sillysaurus3|11 years ago|reply
It's not possible to implement a more precise fsin() without breaking apps?

One scenario I can imagine an app breaking is if someone executed fsin(x), pasted the result into their code, and then tested some other float against it using the == operator.

I suppose a more common breakage scenario would be if someone did something like floor(x + fsin(x)), since that'd very slightly change where the floor "triggers" with respect to the input. But how could that cause an app to break? Every scenario I can think of comes back to someone using == to test floats, which everyone knows is a no-no for dynamic results, such as the output of a function like fsin().

I guess a programmer wouldn't really expect the result of "x + fsin(x)" to ever change for a certain value of x, so maybe they executed it with that certain value, stored the result, and did an equality comparison later, which would always work unless the implementation of fsin() changed, and the thought process went something like, "This should be reliable because realistically the implementation of fsin() won't change."

Can anyone think of some obvious way that making fsin() more accurate could cause serious breakage in an app that isn't testing floats with the equality operator?

[+] CurtHagenlocher|11 years ago|reply
I just noticed that Bruce has moved from Valve to Google... . Interesting!
[+] conistonwater|11 years ago|reply
There is a mathematical reason for why this is not completely horrible, even though this would catch you off guard. When x is approximately pi, the function x -> sin(x) is ill-conditioned. [0] So the relative error between the computed value ybar=fl(sin(x)) and the true value y=sin(x) is (ybar-y)/y and is quite large.

However: you can find a number xbar that is very close to x, in the sense that (xbar-x)/x is very small, such that the computed "inaccurate" value ybar=fl(sin(x)) matches sin(xbar) exactly. Thus this procedure is backward-stable, but not forward-stable. For "typical" numerically-stable algorithms, this might not be such an issue.

Very interesting writeup, anyway. I also found this post by James Gosling [1] describing the same issue, and the slower software workaround implemented in JVM; the post is from 2005. The paper that explain K5's implementation is "The K5 transcendental functions" by Lynch, Ahmed, Schulte, Callaway, Tisdale, but it is behind an academic paywall.

[0] http://en.wikipedia.org/wiki/Condition_number

[1] https://blogs.oracle.com/jag/entry/transcendental_meditation

[+] acqq|11 years ago|reply
https://software.intel.com/en-us/forums/topic/289702

"Shane Story (Intel) Sun, 06/27/2010 - 11:44

(...) rigorous error bounds for most of the x87 transcendental functions (f2xm1, fsincos, fyl2x...) were provided when the Pentium processor released. I recall an appendix in the orginal Pentium manual which showed a collection of ulp (unit in the last place) charts - don't recall the upper bound for all of them but it was something like 1.8 ulps. This would be for double-extended 80 bit fp inputs. Results for the fsin/fcos were for arguments < pi/4 because for the reduction for larger inputs, a 66-bit value of pi (or pi/2) was used."

It's really a documentation bug. Sometimes some important details get lost after the documentation rewrites.

[+] jhallenworld|11 years ago|reply
Related: here is a very good article and comments on estimating transcendental functions: http://www.coranac.com/2009/07/sines/

I'm annoyed that one of the links in the comments is broken: there was a page where you entered your function and it provided a chebyshev polynomial to generate it to a particular precision.

EDIT: well here is another one: http://goddard.net.nz/files/js/chebyshev/cheby.html

[+] colanderman|11 years ago|reply
Note that sin(double(pi)) should not return zero. sin(pi) is, mathematically, zero, but since float/double/long-double can’t represent pi it would actually be incorrect for them to return zero when passed an approximation to pi.

Huh? sin(pi ± epsilon) better damn well return 0 if epislon is closer to 0 than the minimum representable floating-point value. Otherwise the processor is rounding incorrectly.

(I realize epsilon in this case is likely to be much larger than the minimum representable float, but it's not guaranteed and thus the article's reasoning is flawed.)

[+] StefanKarpinski|11 years ago|reply
The article is correct. double(pi) is exactly 884279719003555/281474976710656, which is a perfectly valid number to apply the sin function to. sin(884279719003555/281474976710656) is a transcendental number close to 1.2246467991473532e-16. In fact, that is the closest 64-bit float value to the true value of sin(884279719003555/281474976710656) and thus the best answer that the sin function can give. There is no way to ask the sin(double x) function what the sine of the transcendental number π is since the argument must be a double and there's no double value for π. Since the sin function is not in the business of guessing which of the uncountably many real numbers you really wanted to take the sine of, it can only give you the best answer for the value you actually passed it.
[+] nhaehnle|11 years ago|reply
Actually, epsilon is guaranteed to be much larger than the minimum representable float: double(pi) is 1.57... * 2^1, so double(pi) - pi is going to be on the order of 2^-m, where m is the size of the mantissa. The only way that double(pi) - pi could be much smaller is if pi magically had a lot of 0s in its binary representation (it doesn't).

This caught me off-guard at first as well, but it's really quite logical.

[+] conistonwater|11 years ago|reply
No, it is not pi that must get rounded, but sin(pi). So in sin(double(pi)), first double(pi) is computed, which is not exactly pi, then an approximation to sin(double(pi)) is computed, which is not exactly zero. It would be absolutely wrong to round a non-zero number to zero.
[+] preillyme|11 years ago|reply
Wow, when doing argument reduction from the 80-bit long-double format there will be almost complete cancellation? Damn.
[+] leoh|11 years ago|reply
I wonder what a great standard for this is. Incidentally, do TI or HP calculators give good results?
[+] StefanKarpinski|11 years ago|reply
Wolfram Alpha (and Mathematica) are generally completely accurate at the cost of being less efficient. I use those to check most things I want to verify.
[+] nroets|11 years ago|reply
This is a storm in a tea cup. "The absolute error ... was fairly constant at about 4e-21, which is quite small. For many purposes this will not matter"

Good programmers know that most floating point calculations introduce small errors and find ways to make their programs cope.

[+] lvh|11 years ago|reply
That's pretty bad for a double that you're about to do a bunch of operations with. It's pretty easy to throw numerical precision away.
[+] Aardwolf|11 years ago|reply
> However this technique relies on printing the value of the double-precision approximation of pi to 33 digits, and up to VC++ 2013 this doesn’t work

How can an IDE prevent printing a number? If all else fails, you can reinterpret_cast it to integer and handle the bits yourself?

[+] azernik|11 years ago|reply
First, a nitpick: VC++ is a compiler toolchain (including libraries), not just an IDE.

Also, the next sentence:

    So, you either need *custom printing code* or you need to wait for Dev 14 which has improved printing code.
[+] msie|11 years ago|reply
I'm not worthy, I'm not worthy...

Edit:

The intel engineer was probably smart, but then there's this guy. And then there's the commenters here. Lesson: There's always someone out there smarter than you.

[+] bsder|11 years ago|reply
The lesson is actually "This was designed 35+ years ago in different constraints."

Memory and ROM were small and expensive and processors were in-order and slow. Putting the transcendental in microcode made sense, back then.

Now, memory is huge; processors are superscalar and speculative execution; vector units exist; and everybody ignores the built-in transcendentals, but Intel can't dump them for backwards compatibility.

[+] pedrow|11 years ago|reply
Possible solution: revive the Indiana Pi Bill[0] (assuming 3.2 has an exact representation in binary?)

[0]:http://en.wikipedia.org/wiki/Indiana_Pi_Bill

[+] leni536|11 years ago|reply
assuming 3.2 has an exact representation in binary?

It does not have an exact representation in binary.