(no title)
anematode | 6 days ago
gcc -O3 -march=native -flto -DNDEBUG src/*.c -I include -o dmmsy_bench -lm
./a.out
===========================================================================
n m Dijkstra DMMSY Res DMMSY Opt Spd(Opt)
---------------------------------------------------------------------------
1000 5000 0.0153 0.0171 0.0172 0.89 x
5000 25000 0.0006 0.0001 0.0000 16.26 x
10000 50000 1.0649 1.0729 1.0791 0.99 x
25000 125000 3.1196 3.1592 3.1291 1.00 x
50000 250000 6.9091 7.0232 7.0154 0.98 x
75000 375000 10.7737 10.9603 10.9621 0.98 x
100000 500000 14.9982 14.8546 14.8742 1.01 x
150000 750000 23.3425 23.8628 23.8416 0.98 x
200000 1000000 33.0851 33.8681 34.0710 0.97 x
250000 1250000 43.5518 45.6267 46.4157 0.94 x
350000 1750000 73.1518 75.8528 74.6965 0.98 x
500000 2500000 125.9957 130.0651 129.2500 0.97 x
Retr0id|6 days ago
chipbuster|5 days ago
- An optimization level of -O1 or higher - The -ffinite-math-only flag, or any flag (e.g. -ffast-math) which implies this flag.
The benchmark uses a default value for weights which is #defin-ed as `__builtin_inf()`, and assigns this value in multiple places. This, of course, is concerning, since it gives a very obvious means by which the benchmark might be completely optimized out, though a more careful analysis would be needed to explain why the Dijkstra and (Res) functions don't also get optimized out.
For clang, the equivalent flags are
- An optimization level of Og or higher - The -fno-honor-infinities flag, or any flag (e.g. -ffast-math) which implies this flag.
Notably, while the author enables LTO, and others have compiled with -march=native, neither flag is necessary (for me) to see the huge speedup, which on my machine peaks at over 1.2 million times.
anematode|6 days ago
(I think it's unlikely that it can be straight-up optimized out because it dirties the workspace thread local, and compilers generally don't optimize out writes to thread local variables.)