top | item 47125356

(no title)

It's slop: the numbers are fabricated.

   gcc -O3 -march=native -flto -DNDEBUG src/*.c -I include -o dmmsy_bench -lm

   ./a.out

   ===========================================================================
   n          m          Dijkstra     DMMSY Res    DMMSY Opt    Spd(Opt)
   ---------------------------------------------------------------------------
   1000       5000       0.0153       0.0171       0.0172       0.89    x
   5000       25000      0.0006       0.0001       0.0000       16.26   x
   10000      50000      1.0649       1.0729       1.0791       0.99    x
   25000      125000     3.1196       3.1592       3.1291       1.00    x
   50000      250000     6.9091       7.0232       7.0154       0.98    x
   75000      375000     10.7737      10.9603      10.9621      0.98    x
   100000     500000     14.9982      14.8546      14.8742      1.01    x
   150000     750000     23.3425      23.8628      23.8416      0.98    x
   200000     1000000    33.0851      33.8681      34.0710      0.97    x
   250000     1250000    43.5518      45.6267      46.4157      0.94    x
   350000     1750000    73.1518      75.8528      74.6965      0.98    x
   500000     2500000    125.9957     130.0651     129.2500     0.97    x

discuss

Retr0id|6 days ago

While it is definitely slop, I think the numbers may be "real" but compiler-dependent. The 20000x "speedup" presumably only happens when the compiler detects that it can optimize the whole algorithm into a nop, because it has no observable side effects. (I have not tested this hypothesis)

chipbuster|5 days ago

On my system, under gcc 15.2.1, the two necessary factors to see the claimed speedup are:

- An optimization level of -O1 or higher - The -ffinite-math-only flag, or any flag (e.g. -ffast-math) which implies this flag.

The benchmark uses a default value for weights which is #defin-ed as `__builtin_inf()`, and assigns this value in multiple places. This, of course, is concerning, since it gives a very obvious means by which the benchmark might be completely optimized out, though a more careful analysis would be needed to explain why the Dijkstra and (Res) functions don't also get optimized out.

For clang, the equivalent flags are

- An optimization level of Og or higher - The -fno-honor-infinities flag, or any flag (e.g. -ffast-math) which implies this flag.

Notably, while the author enables LTO, and others have compiled with -march=native, neither flag is necessary (for me) to see the huge speedup, which on my machine peaks at over 1.2 million times.

anematode|6 days ago

Maybe, but I think the OP is submitting this in bad faith (or got utterly bamboozled by the AI). I tried with recent clang and the specified flags, and the behavior is the same.

(I think it's unlikely that it can be straight-up optimized out because it dirties the workspace thread local, and compilers generally don't optimize out writes to thread local variables.)