top | item 46121036

(no title)

alberth | 2 months ago

After 25-years of software development, I still wonder whether I’m using the best possible compiler flags.

discuss

order

cogman10|2 months ago

What I've learned is that the fewer flags is the best path for any long lived project.

-O2 is basically all you usually need. As you update your compiler, it'll end up tweaking exactly what that general optimization does based on what they know today.

Because that's the thing about these flags, you'll generally set them once at the beginning of a project. Compiler authors will reevaluate them way more than you will.

Also, a trap I've observed is setting flags based on bad benchmarks. This applies more to the JVM than a C++ compiler, but never the less, a system's current state is somewhat random. 1->2% fluctuations in performance for even the same app is normal. A lot of people won't realize that and ultimately add flags based on those fluctuations.

But further, how code is currently layed out can affect performance. You may see a speed boost not because you tweaked the loop unrolling variable, but rather your tweak may have relocated a hot path to be slightly more cache friendly. A change in the code structure can eliminate that benefit.

tmtvl|2 months ago

I'd say -O2 -march=native -mtune=native is good enough, you get (some) AVX without the O3 weirdness.

alberth|2 months ago

Doesn't -O2 still exclude any CPU features from the past ~15 years (like AVX).

If you know the architecture and oldest CPU model, we're better served with added a bunch more flags, no?

I wish I could compile my server code to target CPU released on/after a particular date like:

  -O2 -cpu-newer-than=2019

vlovich123|2 months ago

You should at a minimum add flags to enable dead object collection (-fdata-sections and -ffunction-sections for compilation and -Wl,--gc-sections for the linker).

201984|2 months ago

What's your reason for -O2 over -O3?

johnthescott|2 months ago

40 years latter i still have nightmares of long sessions debuging lattice c.