Go vs C++: Ray tracer (part 3)

[+] eliasmacpherson|12 years ago|reply

https://kidoman.com/images/go-vs-cpp-after-both-optimized.pn...

In this picture - am I right in interpreting it at 2048x2048 and 8 cores, that the optimised and tuned go code is nearly three times faster than the multithreaded optimised and tuned C++ code? How come the C++ is the same for 1C and 8C? Is this picture from one of the previous articles?

EDIT: (No, I'm wrong, the C++ is single threaded!)

It seems it's single threaded - and the last graph isn't showing the current level of C++ single threaded performance - which is ~36 seconds, with go at 21s for 8 threads. Final state of play is go at 18s multithreaded and C++ at 8s multithread and go at 81s single threaded and C++ at 36s single threaded. I read that from the third last graph.

It's difficult to understand this ending of the article, as the subtitle is:

"Further optimizations and a multi-threaded C++ version" and "Hurray multi-threading"

I suggest the order of the article be changed around to have a 'recap on single threaded C++' at the start, and then the new figures - so that it concludes in straightforward manner.

This quote from the article is wrong:

"C++ is not more than twice as fast than an equivalent Go program at this stage."

Correct me if I'm wrong, but in every case that I can see the C++ code will execute twice before the Go code is finished. It is more than twice as fast.

By the same logic this "almost" is also wrong: "From taking 58.15 seconds (single threaded), it has now dropped to a extremely impressive 36.36 seconds (again single threaded), making it almost twice as fast as the optimized Go version."

36s < 81s/2

[+] vanderZwan|12 years ago|reply

"C++ is not more than twice as fast than an equivalent Go program at this stage."

I believe that is a simple typo, where 'not' was meant to be a 'now'.

[+] 616c|12 years ago|reply

Someone posted something on Nimrod a while back with something totally related [0], and I think it is interesting it keeps getting passed over. I am not a lang expert, but I have become very curious about the multiple alternatives in the systems programming niche, and others. It has been developed at least since 2008 (with 0.6 branch in 2008)[1] and the Go language's first public debut in 2009 (I do not know how stable it was at the time and I gave up after going through the first pages of the whole commit history to get a better answer).[2] At the very least, they were developed in the same time frame, are nominally similar languages, and addressing a lot of the same use cases as Rust I suppose. Yet, unfortunately few bench marks include them.

I guess they will always be writing an indie language, but I wish more would check it out. As I continue to learn, maybe I can contribute code to his project. We will see if I ever get that far.

[0] http://nimrod-code.org/

[1] http://nimrod-code.org/news.html

[2] http://en.wikipedia.org/wiki/Go_programming_language

[+] gillianseed|12 years ago|reply

>and I think it is interesting it keeps getting passed over.

If you think it's being passed over then you need to try and generate interest in the language, like posting articles about it you find, or if you find none, write one.

Same goes for benchmarking, it's unlikely that people who aren't interested in the language will write benchmark-versions for that language so it's up to those who are interested in it to provide them.

Which is what someone did in the Rogue Level Generation Benchmark where I recall Nimrod performed very well.

http://togototo.wordpress.com/2013/08/23/benchmarks-round-tw...

From my own very cursory glance I'm not quite sure why you would compare Nimrod directly to Go, I think it's more aptly compared to something like Rust, with both having optional garbage collector, generics, macros etc.

[+] kid0m4n|12 years ago|reply

A pull request is always welcome :)

[+] tingletech|12 years ago|reply

Nimrod seems rather unfortunately named. I'm not up on Abrahamic mythos so I didn't get the biblical reference -- to me Nimrod seems like a fighting words insult.

http://www.urbandictionary.com/define.php?term=nimrod

[+] 616c|12 years ago|reply

Well, well, a downvote? I guess struck a nerve by asking why alternative languages, the underlying interest of such a blog post, is upsetting. But I am not sure why given the context.

I guess I will stop asking about the differences between Go and Nimrod.

[+] buster|12 years ago|reply

This has been circulated on the Rust-dev mailinglist:

https://mail.mozilla.org/pipermail/rust-dev/2013-September/0...

Rust did quite well (given that it's not even production ready i'd say it's impressive): https://mail.mozilla.org/pipermail/rust-dev/2013-September/0...

The only thing they noted was that the Go version "cheated" by precomptuing some values which someone removed to make the algorithms the same.. maybe kid0m4n can comment on this :)

[+] kid0m4n|12 years ago|reply

I was looking at the Rust-dev mailing list before going to sleep :)

I would argue that it is not cheating "anymore" as both the Go and C++ version are now equal. In fact, I wanna compare how Rust performs in this exact same test with all optimizations applied. Studying those optimizations will be fun itself.

I have also explained why optimizations are perfectly fine (IMHO) here:

https://github.com/kid0m4n/rays#why-optimize-the-base-algori...

[+] dbaupp|12 years ago|reply

The Rust code that ML thread is discussing: http://github.com/huonw/card-trace

[+] devx|12 years ago|reply

I wonder, do the Rust guys have any intention of supporting AMD's HSA or Mantle API in any way? I know Mozilla wants to make Rust take full advantage of multi-core systems, and AMD is doing that, too, so I wonder if there can be some synergy there, or if that's outside of their goals for Rust.

[+] copx|12 years ago|reply

Looking at the github it seems you benchmark GCC 4.8.1 vs. go 1.2rc1. Numbers for Go look promising if one considers that Google's Go implementation does not even have an advanced optimizer yet (in contrast to GCC).

>c++ -std=c++11 -O3 -Wall -pthread -ffast-math -mtune=native -march=native -o cpprays cpprays/main.cpp

Have you tried -O2? -O3 often generates slower code.

>i7 2600

Intel's compiler would probably generate faster code. That's why you can't just say "Go vs C++". You could let Go win this fight by compiling the C++ with Digital Mars. It is also a C++ compiler but it lacks a modern optimizer and the generated code is usually much slower.

[+] bluecalm|12 years ago|reply

Few remarks:

-mtune is redundant with march=native turned on;

-use -Ofast instead O3/ffast-math it turns some more options as well (although theoretically it might behave in non-standard way with float computations, you need to test this, it wasn't ever a problem for me)

-add -flto it often helps significantly

-it probably won't matter for simple program but you may try compiling with: -Ofast -march=native -flto -fprofile-generate; then run the program (it will generate .gcda files) and then recompile with the same options and -fprofile-use;

EDIT: Some quick tests shows that all the options I mentioned help (MinGW, gcc 4.8.1). Original time on one thread was 3.670s, Ofast and -flto takes it to ~3.630s and adding PGO moves it to 3.46s (there is some variance with all of those) - whooping 5-6% improvement overall :)

[+] MTGandP|12 years ago|reply

Can you elaborate on why -O3 often generates slower code? I've never heard that before.

[+] kid0m4n|12 years ago|reply

gccgo should give better numbers with Go as well. I am guess we pick one and stick with it. I don't mind trying Intel's compiler though

[+] nemothekid|12 years ago|reply

    go build -gcflags -m

After some quick googling I can't find what the "m" flag does. Can anyone shed some light?

[+] nickpresta|12 years ago|reply

  go build -gcflags

If you run `go help build`, you will see:

  -gcflags 'arg list'
		arguments to pass on each 5g, 6g, or 8g compiler invocation.

Then you can run:

  go tool 6g

And see:

  -m	print optimization decisions

[+] kid0m4n|12 years ago|reply

Author of the article here

It helps in finding out details of what the compiler (Xg) thinks of various funcs inlining applicability (is that even a word?)

[+] shin_lao|12 years ago|reply

I wonder how the C++ version would do with TBB. Also I think it would be more interesting to compare more memory intensive programs, I think C++ would shine even more with all the optimizations opportunities there would have.

[+] shin_lao|12 years ago|reply

And what about adding the following compile options:

-m64 -msse3 -mfpmath=sse?

[+] halayli|12 years ago|reply

vector S(vector o,vector d, unsigned int& seed) {}

int T(vector o,vector d,float& t,vector& n) {}

not sure why he's copying the vectors here.

[+] Sharlin|12 years ago|reply

Why not? Note that these are not std::vectors. Copying 12 bytes will probably be cheaper than indirect access through a reference.

[+] hamidr|12 years ago|reply

Also, he's mostly passing by value.

60 comments