top | item 19015736

(no title)

nazri1 | 7 years ago

I saw the openmp pragma and thought to myself "neat! should be fun to watch the cores work hard at this" and went ahead and compiled and run it and smiled at the 400% cpu usage in top.

    $ time ./tinykaboom 
    ./tinykaboom  78.08s user 0.02s system 369% cpu 21.159 total
Then I wondered how it would fare if I were to port it to Go and went ahead and hastily did port to Go and thought that, "hmmm this should run a bit slower than the c++ version" but surprisingly it ran more than twice faster:

    $ go build ./tinykaboom.go
    $ time ./tinykaboom 
    ./tinykaboom  34.32s user 0.03s system 368% cpu 9.315 total
https://github.com/holygeek/tinykaboom/blob/master/tinykaboo...

Here's the corresponding perf report:

Go:

    Samples: 103K of event 'cycles:pp', Event count (approx.): 37252033995665
    Overhead  Command     Shared Object      Symbol
      32.17%  tinykaboom  tinykaboom         [.] math.sin
      28.80%  tinykaboom  tinykaboom         [.] main.hash
      11.81%  tinykaboom  tinykaboom         [.] main.rotate
       7.76%  tinykaboom  tinykaboom         [.] math.Min
       5.18%  tinykaboom  tinykaboom         [.] main.lerpFloat64
       4.25%  tinykaboom  tinykaboom         [.] main.noise
       2.59%  tinykaboom  tinykaboom         [.] runtime.mallocgc
       2.59%  tinykaboom  tinykaboom         [.] main.fractal_brownian_motion
       2.58%  tinykaboom  tinykaboom         [.] main.signed_distance
c++:

    Samples: 234K of event 'cycles:pp', Event count (approx.): 86721459552303
    Overhead  Command     Shared Object        Symbol
      67.93%  tinykaboom  libm-2.23.so         [.] __sin_avx
      30.80%  tinykaboom  tinykaboom           [.] _Z5noiseRK3vecILm3EfE
       1.27%  tinykaboom  libm-2.23.so         [.] __floorf_sse41
       0.00%  tinykaboom  tinykaboom           [.] _Z23fractal_brownian_motionRK3vecILm3EfE
       0.00%  tinykaboom  tinykaboom           [.] floorf@plt
If anyone can give suggestions on how to make the tinykaboom.cpp faster that would be neat!

discuss

order

namirez|7 years ago

There are a few potential improvements here: 1) Use a look up table for 'sin' rather than using 'std::sin'. 2) Tell the compiler what instruction sets to use; for example, tell GCC to use 'skylake' instructions (https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/x86-Options.htm...). 3) Many of the functions could be 'inline constexpr'. 4) Although 'ofs <<' is buffered, it can still be very slow. Create the output in memory and use a lower level function like 'fwrite' to write it to file. 5) Use 'std::thread' or 'std::async'. It makes the multi-threading more portable and clear.

haldean|7 years ago

What were your compilation flags?

nazri1|7 years ago

I used the default one in CmakeLists.txt (-O3).

I ran the comparison again on another machine that I have and this time their performances are about the same:

c++:

    $ time ./tinykaboom
    ./tinykaboom  46.72s user 0.01s system 364% cpu 12.804 total
go:

    $ time ./tinykaboom     
    ./tinykaboom  42.50s user 0.07s system 350% cpu 12.161 total