On gcc 13, the difference in assembly between the min(max()) version and std::clamp is eliminated when I add the -ffast-math flag. I suspect that the two implementations handle one of the arguments being NaN a bit differently.
You (celegans25) probably know this but here is a PSA that -ffast-math is really -finaccurate-math. The knowledgeable developer will know when to use it (almost never) while the naive user will have bugs.
Why do you say almost never? Don’t let the name scare you; all floating point math is inaccurate. Fast math is only slightly less accurate, I think typically it’s a 1 or maybe 2 LSB difference. At least in CUDA it is, and I think many (most?) people & situations can tolerate 22 bits of mantissa compared to 23, and many (most?) people/situations aren’t paying attention to inf/nan/exception issues at all.
I deal with a lot of floating point professionally day to day, and I use fast math all the time, since the tradeoff for higher performance and the relatively small loss of accuracy are acceptable. Maybe the biggest issue I run into is lack of denorms with CUDA fast-math, and it’s pretty rare for me to care about numbers smaller than 10^-38. Heck, I’d say I can tolerate 8 or 16 bits of mantissa most of the time, and fast-math floats are way more accurate than that. And we know a lot of neural network training these days can tolerate less than 8 bits of mantissa.
If your code ventures into the domain where fast-math matters and you're not a mathematician trying to solve a lyapunov-unstable problem with very tricky numeric methods, then most likely your code is already broken.
Another PSA is that dynamic libraries compiled with fast-math will also introduce inaccuracies in unrelated libraries in the same executable, as they introduce dynamic initialization that globally changes the floating point environment.
Ehh, not so much inaccurate, more of a "floating point numbers are tricky, let's act like they aren't".
Compilers are pretty skittish about changing the order of floating point operations (for good reason) and ffast-math is the thing that lets them transform equations to try and generate faster code.
IE, instead of doing "n / 10" doing "n * 0.1". The issue, of course, being that things like 0.1 can't be perfectly represented with floats but 100 / 10 can be. So now you've introduced a tiny bit of error where it might not have existed.
One of the things that you can do with D and as far as I know Julia is enable specific optimizations locally e.g. allow FMAs here and there, not globally.
fast-math is one of the dumbest things we have as an industry IMO.
gumby|2 years ago
mort96|2 years ago
dahart|2 years ago
I deal with a lot of floating point professionally day to day, and I use fast math all the time, since the tradeoff for higher performance and the relatively small loss of accuracy are acceptable. Maybe the biggest issue I run into is lack of denorms with CUDA fast-math, and it’s pretty rare for me to care about numbers smaller than 10^-38. Heck, I’d say I can tolerate 8 or 16 bits of mantissa most of the time, and fast-math floats are way more accurate than that. And we know a lot of neural network training these days can tolerate less than 8 bits of mantissa.
alexey-salmin|2 years ago
planede|2 years ago
cogman10|2 years ago
Compilers are pretty skittish about changing the order of floating point operations (for good reason) and ffast-math is the thing that lets them transform equations to try and generate faster code.
IE, instead of doing "n / 10" doing "n * 0.1". The issue, of course, being that things like 0.1 can't be perfectly represented with floats but 100 / 10 can be. So now you've introduced a tiny bit of error where it might not have existed.
mhh__|2 years ago
fast-math is one of the dumbest things we have as an industry IMO.
unknown|2 years ago
[deleted]