In this picture - am I right in interpreting it at 2048x2048 and 8 cores, that the optimised and tuned go code is nearly three times faster than the multithreaded optimised and tuned C++ code? How come the C++ is the same for 1C and 8C? Is this picture from one of the previous articles?
EDIT: (No, I'm wrong, the C++ is single threaded!)
It seems it's single threaded - and the last graph isn't showing the current level of C++ single threaded performance - which is ~36 seconds, with go at 21s for 8 threads. Final state of play is go at 18s multithreaded and C++ at 8s multithread and go at 81s single threaded and C++ at 36s single threaded. I read that from the third last graph.
It's difficult to understand this ending of the article, as the subtitle is:
"Further optimizations and a multi-threaded C++ version"
and
"Hurray multi-threading"
I suggest the order of the article be changed around to have a 'recap on single threaded C++' at the start, and then the new figures - so that it concludes in straightforward manner.
This quote from the article is wrong:
"C++ is not more than twice as fast than an equivalent Go program at this stage."
Correct me if I'm wrong, but in every case that I can see the C++ code will execute twice before the Go code is finished. It is more than twice as fast.
By the same logic this "almost" is also wrong: "From taking 58.15 seconds (single threaded), it has now dropped to a extremely impressive 36.36 seconds (again single threaded), making it almost twice as fast as the optimized Go version."
Someone posted something on Nimrod a while back with something totally related [0], and I think it is interesting it keeps getting passed over. I am not a lang expert, but I have become very curious about the multiple alternatives in the systems programming niche, and others. It has been developed at least since 2008 (with 0.6 branch in 2008)[1] and the Go language's first public debut in 2009 (I do not know how stable it was at the time and I gave up after going through the first pages of the whole commit history to get a better answer).[2] At the very least, they were developed in the same time frame, are nominally similar languages, and addressing a lot of the same use cases as Rust I suppose. Yet, unfortunately few bench marks include them.
I guess they will always be writing an indie language, but I wish more would check it out. As I continue to learn, maybe I can contribute code to his project. We will see if I ever get that far.
>and I think it is interesting it keeps getting passed over.
If you think it's being passed over then you need to try and generate interest in the language, like posting articles about it you find, or if you find none, write one.
Same goes for benchmarking, it's unlikely that people who aren't interested in the language will write benchmark-versions for that language so it's up to those who are interested in it to provide them.
Which is what someone did in the Rogue Level Generation Benchmark where I recall Nimrod performed very well.
From my own very cursory glance I'm not quite sure why you would compare Nimrod directly to Go, I think it's more aptly compared to something like Rust, with both having optional garbage collector, generics, macros etc.
Nimrod seems rather unfortunately named. I'm not up on Abrahamic mythos so I didn't get the biblical reference -- to me Nimrod seems like a fighting words insult.
Well, well, a downvote? I guess struck a nerve by asking why alternative languages, the underlying interest of such a blog post, is upsetting. But I am not sure why given the context.
I guess I will stop asking about the differences between Go and Nimrod.
The only thing they noted was that the Go version "cheated" by precomptuing some values which someone removed to make the algorithms the same.. maybe kid0m4n can comment on this :)
I was looking at the Rust-dev mailing list before going to sleep :)
I would argue that it is not cheating "anymore" as both the Go and C++ version are now equal. In fact, I wanna compare how Rust performs in this exact same test with all optimizations applied. Studying those optimizations will be fun itself.
I have also explained why optimizations are perfectly fine (IMHO) here:
I wonder, do the Rust guys have any intention of supporting AMD's HSA or Mantle API in any way? I know Mozilla wants to make Rust take full advantage of multi-core systems, and AMD is doing that, too, so I wonder if there can be some synergy there, or if that's outside of their goals for Rust.
Looking at the github it seems you benchmark GCC 4.8.1 vs. go 1.2rc1. Numbers for Go look promising if one considers that Google's Go implementation does not even have an advanced optimizer yet (in contrast to GCC).
Have you tried -O2? -O3 often generates slower code.
>i7 2600
Intel's compiler would probably generate faster code. That's why you can't just say "Go vs C++". You could let Go win this fight by compiling the C++ with Digital Mars. It is also a C++ compiler but it lacks a modern optimizer and the generated code is usually much slower.
-use -Ofast instead O3/ffast-math it turns some more options as well (although theoretically it might behave in non-standard way with float computations, you need to test this, it wasn't ever a problem for me)
-add -flto it often helps significantly
-it probably won't matter for simple program but you may try compiling with:
-Ofast -march=native -flto -fprofile-generate;
then run the program (it will generate .gcda files) and then recompile with the same options and -fprofile-use;
EDIT:
Some quick tests shows that all the options I mentioned help (MinGW, gcc 4.8.1). Original time on one thread was 3.670s, Ofast and -flto takes it to ~3.630s and adding PGO moves it to 3.46s (there is some variance with all of those) - whooping 5-6% improvement overall :)
I wonder how the C++ version would do with TBB. Also I think it would be more interesting to compare more memory intensive programs, I think C++ would shine even more with all the optimizations opportunities there would have.
[+] [-] eliasmacpherson|12 years ago|reply
In this picture - am I right in interpreting it at 2048x2048 and 8 cores, that the optimised and tuned go code is nearly three times faster than the multithreaded optimised and tuned C++ code? How come the C++ is the same for 1C and 8C? Is this picture from one of the previous articles?
EDIT: (No, I'm wrong, the C++ is single threaded!)
It seems it's single threaded - and the last graph isn't showing the current level of C++ single threaded performance - which is ~36 seconds, with go at 21s for 8 threads. Final state of play is go at 18s multithreaded and C++ at 8s multithread and go at 81s single threaded and C++ at 36s single threaded. I read that from the third last graph.
It's difficult to understand this ending of the article, as the subtitle is:
"Further optimizations and a multi-threaded C++ version" and "Hurray multi-threading"
I suggest the order of the article be changed around to have a 'recap on single threaded C++' at the start, and then the new figures - so that it concludes in straightforward manner.
This quote from the article is wrong:
"C++ is not more than twice as fast than an equivalent Go program at this stage."
Correct me if I'm wrong, but in every case that I can see the C++ code will execute twice before the Go code is finished. It is more than twice as fast.
By the same logic this "almost" is also wrong: "From taking 58.15 seconds (single threaded), it has now dropped to a extremely impressive 36.36 seconds (again single threaded), making it almost twice as fast as the optimized Go version."
36s < 81s/2
[+] [-] vanderZwan|12 years ago|reply
I believe that is a simple typo, where 'not' was meant to be a 'now'.
[+] [-] 616c|12 years ago|reply
I guess they will always be writing an indie language, but I wish more would check it out. As I continue to learn, maybe I can contribute code to his project. We will see if I ever get that far.
[0] http://nimrod-code.org/
[1] http://nimrod-code.org/news.html
[2] http://en.wikipedia.org/wiki/Go_programming_language
[+] [-] gillianseed|12 years ago|reply
If you think it's being passed over then you need to try and generate interest in the language, like posting articles about it you find, or if you find none, write one.
Same goes for benchmarking, it's unlikely that people who aren't interested in the language will write benchmark-versions for that language so it's up to those who are interested in it to provide them.
Which is what someone did in the Rogue Level Generation Benchmark where I recall Nimrod performed very well.
http://togototo.wordpress.com/2013/08/23/benchmarks-round-tw...
From my own very cursory glance I'm not quite sure why you would compare Nimrod directly to Go, I think it's more aptly compared to something like Rust, with both having optional garbage collector, generics, macros etc.
[+] [-] kid0m4n|12 years ago|reply
[+] [-] tingletech|12 years ago|reply
http://www.urbandictionary.com/define.php?term=nimrod
[+] [-] 616c|12 years ago|reply
I guess I will stop asking about the differences between Go and Nimrod.
[+] [-] buster|12 years ago|reply
https://mail.mozilla.org/pipermail/rust-dev/2013-September/0...
Rust did quite well (given that it's not even production ready i'd say it's impressive): https://mail.mozilla.org/pipermail/rust-dev/2013-September/0...
The only thing they noted was that the Go version "cheated" by precomptuing some values which someone removed to make the algorithms the same.. maybe kid0m4n can comment on this :)
[+] [-] kid0m4n|12 years ago|reply
I would argue that it is not cheating "anymore" as both the Go and C++ version are now equal. In fact, I wanna compare how Rust performs in this exact same test with all optimizations applied. Studying those optimizations will be fun itself.
I have also explained why optimizations are perfectly fine (IMHO) here:
https://github.com/kid0m4n/rays#why-optimize-the-base-algori...
[+] [-] dbaupp|12 years ago|reply
[+] [-] devx|12 years ago|reply
[+] [-] copx|12 years ago|reply
>c++ -std=c++11 -O3 -Wall -pthread -ffast-math -mtune=native -march=native -o cpprays cpprays/main.cpp
Have you tried -O2? -O3 often generates slower code.
>i7 2600
Intel's compiler would probably generate faster code. That's why you can't just say "Go vs C++". You could let Go win this fight by compiling the C++ with Digital Mars. It is also a C++ compiler but it lacks a modern optimizer and the generated code is usually much slower.
[+] [-] bluecalm|12 years ago|reply
-mtune is redundant with march=native turned on;
-use -Ofast instead O3/ffast-math it turns some more options as well (although theoretically it might behave in non-standard way with float computations, you need to test this, it wasn't ever a problem for me)
-add -flto it often helps significantly
-it probably won't matter for simple program but you may try compiling with: -Ofast -march=native -flto -fprofile-generate; then run the program (it will generate .gcda files) and then recompile with the same options and -fprofile-use;
EDIT: Some quick tests shows that all the options I mentioned help (MinGW, gcc 4.8.1). Original time on one thread was 3.670s, Ofast and -flto takes it to ~3.630s and adding PGO moves it to 3.46s (there is some variance with all of those) - whooping 5-6% improvement overall :)
[+] [-] MTGandP|12 years ago|reply
[+] [-] kid0m4n|12 years ago|reply
[+] [-] nemothekid|12 years ago|reply
[+] [-] nickpresta|12 years ago|reply
[+] [-] kid0m4n|12 years ago|reply
It helps in finding out details of what the compiler (Xg) thinks of various funcs inlining applicability (is that even a word?)
[+] [-] shin_lao|12 years ago|reply
[+] [-] shin_lao|12 years ago|reply
-m64 -msse3 -mfpmath=sse?
[+] [-] halayli|12 years ago|reply
int T(vector o,vector d,float& t,vector& n) {}
not sure why he's copying the vectors here.
[+] [-] Sharlin|12 years ago|reply
[+] [-] hamidr|12 years ago|reply