top | item 39746818

(no title)

anon946 | 1 year ago

IMO, not using any optimization flags with C is somewhat arbitrary, since the compiler writers could have just decided that by default we'll do thing X, Y, and Z, and then you'd need to turn them off explicitly.

FWIW, without -O, with -O, and with -O4, I get 2500ms, 1500ms, and 550ms respectively. I didn't bother to look at the .S to see the code improvements. (Of course, I edited the code to output the results, otherwise, it just optimized out everything.)

discuss

abainbridge|1 year ago

One optimization for the C code is to put "f" suffixes on the floating point constants. For example convert this line:

    t[i] += 0.02 * (float)j;

to:

    t[i] += 0.02f * (float)j;

I believe this helps because 0.02 is a double and doing double * float and then converting the result to float can produce a different answer to just doing float * float. The compiler has to do the slow version because that's what you asked for.

Adding the -ffast-math switch appears to make no difference. I'm never sure what -ffast-math does exactly.

Minimal case on Godbolt:

https://godbolt.org/z/W18YsnMY5 - without the f

https://godbolt.org/z/oc1s8WKeG - with the f

a1369209993|1 year ago

> I believe this helps because 0.02 is a double and [...] can produce a different answer

In principle, not quite. The real/unavoidable(-by-the-compiler) problem is that 0.02 is a not a diadic rational (not representable exactly as some integer over a power of two). So its representation (rounded to 52 bits) as a double is a different real number than its representation (rounded to 23 bits) as a float. (This is the same problem as rounding pi or e to a double/float, but people tend to forget that it applies to all diadic irrationals, not just regular irrationals.)

If, instead of `0.02f` you replaced `0.02` with `(double)0.02f` or `0.015625`, the optimization should in theory still apply (although missed optimization complier bugs are of course possible).

sllabres|1 year ago

"I'm never sure what -ffast-math does exactly."

Me too when I am away from C for a while. The topic has been on HN [3]

* Enable the use of SIMD instructions

* alter the behavior regarding NaN (you can't even check for NaN afterwards with isnan(f))

* alter the associativity of expression a+(b+c) might become (a+b)+c which seems inconspicuous at first, but there are exceptions (as example see [1] under -fassociative-math)

* change subnormals to zero (even if your program isn't compiled with this option, but a library you link to your program).

A nice overview from which I summarize is in [1] which contains a link to [2] with this nice text:

"If a sufficiently advanced compiler is indistinguishable from an adversary, then giving the compiler access to -ffast-math is gifting that enemy nukes. That doesn’t mean you can’t use it! You just have to test enough to gain confidence that no bombs go off with your compiler on your system"

[1] https://simonbyrne.github.io/notes/fastmath/

[2] https://discourse.julialang.org/t/when-if-a-b-x-1-a-b-divide...

[3] https://news.ycombinator.com/item?id=29201473 (107)

caranea|1 year ago

Thanks for posting your results!

Since I was already set on writing in-browser particle life, I didn't benchmark C code with different flags.

anon946|1 year ago

Completely reasonable. I'd probably edit your blog post a bit to indicate that.

hyperbrainer|1 year ago

> (Of course, I edited the code to output the results, otherwise, it just optimized out everything.)

O(1) :)

TylerE|1 year ago

Should also test -Os when doing this sort of thing. Sometimes the reduced size greatly improves cache behavior, and even when not it's often outright competitive with at least -O2 anyway (usually compiles faster too!)

unknown|1 year ago

[deleted]