top | item 36475781

(no title)

ColFrancis | 2 years ago

As I understand it, it's more that the mathematical functions aren't specified by the standard.

There is no sqrt instruction. So you stard with a taylor approximation, then maybe do a few rounds of Newton's method to refine the answer. But what degree is the taylor series? Where did you centre it (can't be around 0)? How many Newton's methods did you do? IEEE 754 might dictate what multiplication looks like, but for sqrt, sin, cos, tan you need to come up with a method of calculation, and that's in the standard library which usually comes from the compiler. You could make your own implementations but...

Floats are not associative: (a+b)+c != a+(b+c). So even something like reordering fundamental operations can cause divergence.

discuss

order

zokier|2 years ago

> There is no sqrt instruction. So you stard with a taylor approximation, then maybe do a few rounds of Newton's method to refine the answer. But what degree is the taylor series? Where did you centre it (can't be around 0)? How many Newton's methods did you do? IEEE 754 might dictate what multiplication looks like, but for sqrt, sin, cos, tan you need to come up with a method of calculation, and that's in the standard library which usually comes from the compiler. You could make your own implementations but...

This is just plain wrong. IEEE 754 defines many common mathematical functions, sqrt and trig included, and recommends them to return correct values to the last ulp. Most common cpus have hardware instructions for those, although Intels implementation is notoriously bad.

Furthermore there are some high-quality FP libraries out there, like SLEEF and rlibm, and CORE-MATH project that aims to improve the standard libraries in use.

cornstalks|2 years ago

> recommends

That word is really important because it means you can’t rely on it for real determinism.

dundarious|2 years ago

Your general point stands, but with corrections:

- sqrt actually can be a single instruction on x86_64 for example. Look at how musl implements it, it's just a single line inline asm statement, `__asm__ ("sqrtsd %1, %0" : "=x"(x) : "x"(x));`: https://github.com/ifduyue/musl/blob/master/src/math/x86_64/... Of course, not every x64_64 impl must use that instruction, and not every architecture must match Intel's implementation. I've never looked into it, but wouldn't be surprised if even Intel and AMD have some differences.

- operator precedence is well-defined, and IEE754 compliant compilation modes respect the non-associativity. In most popular compilers, you need to pass specific flags to allow the compiler to change the associativity of operations (-ffast-math implies -funsafe-math-optimizations which implies -fassociative-math which actually breaks strict adherence to the program text). A somewhat similar issue does arise with floating point environment though, as statements may be reordered, function arguments order of evaluation may differ, etc.

The fact that compilers respect the non-associativity of program text is a huge reason why compilers are very limited in how much auto-vectorization they will do for floating point. The classic example is a sum of squares, where it bars itself from even loop unrolling, never mind SIMD with FMADDs. To do any of that, you have to fiddle with compiler options that are often problematic when enabled globally, or __attribute__((optimize("...")) or __pragma(optimize("...")), or probably best of all, explicitly vectorize using intrinsics.