(no title)
piadodjanho | 5 years ago
Demo: #include <stdio.h>
int
main ()
{
volatile float v;
float acc = 0;
float den = 1.40129846432e-45;
for (size_t i; i < (1ul<<33); i++) {
acc += den;
}
v = acc;
return 0;
}
With -01:
$ gcc float.c -o float -O1 && time ./float
./float 8.93s user 0.00s system 99% cpu 8.933 totalWith -O0: $ gcc float.c -o float -O1 && time ./float ./float 20.60s user 0.00s system 99% cpu 20.610 total
GuB-42|5 years ago
EDIT That one is interesting too
clang (9.0.1) performs about the same without -ffast-math; but with it, it managed to optimize the loop away.piadodjanho|5 years ago
I've been fighting the compiler to generate a minimal working example of the subnormals, but didn't have any success.
Some things take need to be taken in account (from the top of my head):
- Rounding. You don't want to get stuck in the same number. - The FPU have some accumulator register that are larger than the floating point register. - Using more register than the architecture has it not trivial because the register renaming and code reordering. The CPU might optimize in a way that the data never leaves those register.
Trying to make a mwe, I found this code:
Runs is fraction of seconds with -O0: But takes forever (killed after 5 minutes) with -O1: I'm using gcc (Arch Linux 9.3.0-1) 9.3.0 on i7-8700I also manage to create a code that sometimes run in 1s, but in others would take 30s. Didn't matter if I recompiled.
Floating point is hard.
sigjuice|5 years ago
sigjuice|5 years ago