In other Rust/benchmarksgame news, I just submitted a simple fix to the Rust program for "reverse-complement" that makes it faster than the fastest C++ program, on my computer. The old version was spending 2/3 of its time just reading the input into memory, because it wasn't allocating a large enough buffer up front.
I'm also working on some additional changes that make it even faster than the C version (again, on my computer) by improving how it divides work across CPUs:
I like Rust for what it does in advancing the state of the art of the languages, but I also like how this example demonstrates how hard it is to avoid "unsafe" constructs and remain competitive.
> afaict comparing Rust program #4 [5.30 secs] to Rust program #5 [9.14 secs] is all about differences between std::collections::HashMap and that experimental hash table.
When discussing whether or not to use this at least someone mentioned that there company was using it in production. I don't think it really counts as experimental.
Java is showing quite impressive numbers! 50% overhead over native C implementations was often cited as a good guess for the ultimate efficiency of JIT code generation back in the Self Hotspot days.
People who were trying to castigate Go early on as having "Java-like speeds" were really just showing their ignorance of the state of the art of JIT compilation for managed languages and the JVM. Such outdated folk knowledge of performance in the programming field seems to be a constant over the decades. (Programmers have had such distorted views since the mid 80's at least.) Maybe this kind of knowledge needs to be a used in job interview questions for awhile? Very soon, people will just memorize such trivia for interviews, but it would serve to squash this form of folk programming "alternative fact."
HotSpot has had great speeds for numeric computation at least since 2005. I was doing financial software in Java in my first job out of college, our CTO was an ex-Sun architect who literally wrote the book on Java, and the speeds we got on numerical computations were basically equivalent to C.
The part where Java really falls down is in memory use & management, which you can see on the binary-tree & mandelbrot benchmarks, where it's roughly 4x slower than C. There are inherent penalties to pointer chasing that you can't get around. While HotSpot is often (amazingly) smart enough to inline & stack-allocate small private structs, typical Java coding style relies on complex object graphs. In C++ or Rust these would all have well-defined object ownership and be contained within a single block of memory, so access is just "add a constant to this pointer, and load". In Java, you often need to trace a graph of pointers 4-5 levels deep, each of which may cause a cache miss.
Rule of thumb while I was at Google was to figure on real-world Java being about 2-3x slower than real-world C++.
Probably because people are intelligent enough not to compare speeds inside of a vacuum. When someone denigrates a language as "java-like" they're really just comparing it anecdotally to the sum of all Java projects they've worked with. Rarely is the project a single-purpose, optimized pet-project.
Note that in many of the other benchmarks, the fastest Java program takes 2x to 4x longer than the fastest C program. Still not bad! But knucleotide shows Java in a better light than most:
There are lots of folk programming like the impact of bounds checking or that all game consoles except for XBox use OpenGL.
Also many younger developers believe that C was always fast, and are unaware that early 8 and 16 bit compilers for home computers were like managed languages. The compilers generated way worse code than hobby Assembly developers.
359% more RAM isn't very impressive. Even less so when one considers the 20+ years of effort spent to achieve it.
You know what impresses me? A 22 month old language besting everything else while guaranteeing no segfaults or NPEs at compile time. That's impressive.
It seems to me that there are a lot of apples-to-oranges comparisons here? Some implementations are using the language's standard library hashtable implementation while others are using 3rd party version (with different algorithms and data structures across all of them), some are using multiple threads while others are single threaded etc. As a result, I wouldn't read too much into the rankings you see here.
Sure, but I'm less worried about absolute rank and more interested in whether the language I'm using is on the order of C/C++/Fortran.
I can sell a 2x slowdown to my boss if I can show that that's the only cost of a significantly more elegant and productive language, but at 100x that's a much harder sell.
And some languages are allowed to use FFI to make their impl faster. There's some rule about this that I don't understand, but oh well. It's all for fun, not serious.
But come on, now Rust can legitimately be called "faster than C" ;)
At least until the Clang C version is added... or maybe it will still be faster.
When SIMD goes stable rust may dominate that game. Still wish they would use clang so it was apples to apples with c and c++. Edit: actually I wish they would add clang for those languages and leave GCC for comparison. Then I'd want FORTRAN to add gfortran for the same reason.
"If you're interested in something not shown on the benchmarks game website then please take the program source code and the measurement scripts and publish your own measurements."
Just FYI in case it happens to others: I was confused because the page I get shows Rust coming in fourth. I had to reload it/re-sort the columns a couple times to see the new Rust #4 entry.
> Some language implementations have hash tables built-in; some provide a hash table as part of a collections library; some use a third-party hash table library. (For example, use either khash or CK_HT for C language k-nucleotide programs.) The hash table algorithm implemented is likely to be different in different libraries.
> Please don't implement your own custom "hash table" - it will not be accepted.
> The work is to use the built-in or library hash table implementation to accumulate count values - lookup the count for a key and update the count in the hash table.
The C++ implementation is thereby testing an old version of a non-standard extension of libstdc++ that I had never heard of and which was likely contributed once by IBM and never really looked at again (by either maintainers or users ;P), while the C implementation is testing the specified khash library, which is apparently something a number of people actively contribute to and attempt to optimize, giving it some notoriety.
If I were to do this in C++, and I wasn't allowed to use my hash table, I would almost certainly not be using __gnu_pbds::cc_hash_table<>. If I were to just want to use something "included", I would use the C++11 std::unordered_map<> (note that this code is compiled already as C++11). But we all know the STL is designed for flexibility and predictability, not performance, and the culture of C++ is "that's OK, as if you actually care about performance no off-the-shelf data structure is going to be correct". If I decided I wanted speed, I know I'd want to check out Folly, and I might even end up using khash.
Reading other comments, what happened here is the Rust version is now using some "experimental" hash table based on ongoing work to optimize the Python 3000 dict implementation. This is just not a useful benchmark. What we are benchmarking is "how maintained is the implementation's built in hash table and is it tunable for this particular workload".
That's why you should not be surprised to see Java doing so well: the code actually being written here is just some glue... your programming language has to be incompetent to do poorly at this benchmark (especially as many commenters here are using a "within a power of 2" rule of thumb). There are even multiple listings for the Java one, and the one that is faster is using it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap?!?
What we really should be asking here is: why is any language doing "poorly" in this benchmark? It just isn't surprising that Rust is competitive with C/C++, nor is it surprising that Java is also; what is surprising is that Swift, Go, Haskell, and C# are not, and so I bet the issue is something (such as "is allocating memory for a thing which is not required") that can be trivially fixed for each (though by the rules of engagement, it might... or might not :/ as Java "cheated", right? ;P... require a minor fix upstream).
I mean, since the "work" explicitly is not "write a hash table using nothing but primitives from this language", there is no particular reason why Perl and Python (which is using a non-destructive array .replace, which is likely brutal... again: I bet this is almost always a benchmark of ancillary memory allocations) should be doing as poorly as they are: if we all made "optimize for this benchmark" a top priority for a weekend hackathon, I bet we could get every open source language to nail this under 25s. But do we care?
Whatever the motivation for your comments, they did remind me that I'd intended to have background information URLs on other pages (not just the home page).
I'm not a system programmer, but it makes me very happy to see Rust taking off, with all its potential to, hopefully, replace currently most used unsafe system languages.
quite a decent perf from the ML family at around 19 seconds (F# and Ocaml). Top of the functionals, at least, twice as fast as Haskell.
Also look how ginormous the binaries are for all the VM languages. Kinda would have thought it would be the opposite what with not needing to link in as much runtime?
The rust version is using multiple cpus using a pool concept (which looks a lot like the multiprocessing module from python so kudos there). But the C version is single threaded from what I can tell. So rust is safe but threaded to be faster than single threaded C which isn't that much slower. Hmm...
That's not true. If you look at the comparisons, the cpu time taken by C is actually more than rust, and the cpu load looks about even. Note the C version uses:
[+] [-] mbrubeck|9 years ago|reply
https://github.com/TeXitoi/benchmarksgame-rs/pull/44
I'm also working on some additional changes that make it even faster than the C version (again, on my computer) by improving how it divides work across CPUs:
https://github.com/TeXitoi/benchmarksgame-rs/pull/46
These improved Rust programs have not yet been added to the benchmarksgame site. Previous entries are ranked at:
http://benchmarksgame.alioth.debian.org/u64q/performance.php...
Minor improvements to the Rust programs for "mandelbrot" and "binary-trees" are also awaiting review!
[+] [-] kzrdude|9 years ago|reply
[+] [-] acqq|9 years ago|reply
I like Rust for what it does in advancing the state of the art of the languages, but I also like how this example demonstrates how hard it is to avoid "unsafe" constructs and remain competitive.
https://github.com/TeXitoi/benchmarksgame-rs/blob/master/src...
[+] [-] igouy|9 years ago|reply
[46.03 secs] http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...)
and then the hash function was changed to FnvHasher --
[17.10 secs] http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...
and then better use of quad core with futures_cpupool --
[9.44 secs] http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...
and now use of std::collections::HashMap has been replaced with an experimental hash table inspired by Python 3.6's new dict implementation --
[5.30 secs] http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...
afaict comparing #4 to #5 is all about differences between that experimental hash table and std::collections::HashMap --
[9.14 secs] http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...
[+] [-] masklinn|9 years ago|reply
There was also a small contribution (~6%) of working on bytes rather than strings according to the original PR: https://github.com/TeXitoi/benchmarksgame-rs/pull/39
[+] [-] gpm|9 years ago|reply
When discussing whether or not to use this at least someone mentioned that there company was using it in production. I don't think it really counts as experimental.
[+] [-] stcredzero|9 years ago|reply
People who were trying to castigate Go early on as having "Java-like speeds" were really just showing their ignorance of the state of the art of JIT compilation for managed languages and the JVM. Such outdated folk knowledge of performance in the programming field seems to be a constant over the decades. (Programmers have had such distorted views since the mid 80's at least.) Maybe this kind of knowledge needs to be a used in job interview questions for awhile? Very soon, people will just memorize such trivia for interviews, but it would serve to squash this form of folk programming "alternative fact."
[+] [-] nostrademons|9 years ago|reply
The part where Java really falls down is in memory use & management, which you can see on the binary-tree & mandelbrot benchmarks, where it's roughly 4x slower than C. There are inherent penalties to pointer chasing that you can't get around. While HotSpot is often (amazingly) smart enough to inline & stack-allocate small private structs, typical Java coding style relies on complex object graphs. In C++ or Rust these would all have well-defined object ownership and be contained within a single block of memory, so access is just "add a constant to this pointer, and load". In Java, you often need to trace a graph of pointers 4-5 levels deep, each of which may cause a cache miss.
Rule of thumb while I was at Google was to figure on real-world Java being about 2-3x slower than real-world C++.
[+] [-] matt_wulfeck|9 years ago|reply
[+] [-] mbrubeck|9 years ago|reply
http://benchmarksgame.alioth.debian.org/u64q/java.html
(And as always, it's possible that someone can write a much faster Java program than the ones submitted so far.)
[+] [-] pjmlp|9 years ago|reply
Also many younger developers believe that C was always fast, and are unaware that early 8 and 16 bit compilers for home computers were like managed languages. The compilers generated way worse code than hobby Assembly developers.
[+] [-] cnnsucks|9 years ago|reply
359% more RAM isn't very impressive. Even less so when one considers the 20+ years of effort spent to achieve it.
You know what impresses me? A 22 month old language besting everything else while guaranteeing no segfaults or NPEs at compile time. That's impressive.
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] chris_overseas|9 years ago|reply
[+] [-] 88e282102ae2e5b|9 years ago|reply
I can sell a 2x slowdown to my boss if I can show that that's the only cost of a significantly more elegant and productive language, but at 100x that's a much harder sell.
[+] [-] bluejekyll|9 years ago|reply
But come on, now Rust can legitimately be called "faster than C" ;)
At least until the Clang C version is added... or maybe it will still be faster.
[+] [-] spoiler|9 years ago|reply
So, the usual. :)
I stopped taking into account the benchmarks game completely after I saw apples-to-potatoes comparisons a few years back.
[+] [-] igouy|9 years ago|reply
http://benchmarksgame.alioth.debian.org/u64q/knucleotide-des...
[+] [-] IshKebab|9 years ago|reply
[+] [-] adrianN|9 years ago|reply
[+] [-] igouy|9 years ago|reply
Perhaps sort by cpu (seconds) rather than (elapsed) secs.
http://benchmarksgame.alioth.debian.org/u64q/performance.php...
[+] [-] imaginenore|9 years ago|reply
[+] [-] galangalalgol|9 years ago|reply
[+] [-] igouy|9 years ago|reply
http://benchmarksgame.alioth.debian.org/play.html#languagex
[+] [-] staticassertion|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] richard_todd|9 years ago|reply
[+] [-] crb002|9 years ago|reply
[+] [-] kzrdude|9 years ago|reply
[+] [-] 0xFFC|9 years ago|reply
[+] [-] crb002|9 years ago|reply
[+] [-] saurik|9 years ago|reply
> Please don't implement your own custom "hash table" - it will not be accepted.
> The work is to use the built-in or library hash table implementation to accumulate count values - lookup the count for a key and update the count in the hash table.
The C++ implementation is thereby testing an old version of a non-standard extension of libstdc++ that I had never heard of and which was likely contributed once by IBM and never really looked at again (by either maintainers or users ;P), while the C implementation is testing the specified khash library, which is apparently something a number of people actively contribute to and attempt to optimize, giving it some notoriety.
If I were to do this in C++, and I wasn't allowed to use my hash table, I would almost certainly not be using __gnu_pbds::cc_hash_table<>. If I were to just want to use something "included", I would use the C++11 std::unordered_map<> (note that this code is compiled already as C++11). But we all know the STL is designed for flexibility and predictability, not performance, and the culture of C++ is "that's OK, as if you actually care about performance no off-the-shelf data structure is going to be correct". If I decided I wanted speed, I know I'd want to check out Folly, and I might even end up using khash.
Reading other comments, what happened here is the Rust version is now using some "experimental" hash table based on ongoing work to optimize the Python 3000 dict implementation. This is just not a useful benchmark. What we are benchmarking is "how maintained is the implementation's built in hash table and is it tunable for this particular workload".
That's why you should not be surprised to see Java doing so well: the code actually being written here is just some glue... your programming language has to be incompetent to do poorly at this benchmark (especially as many commenters here are using a "within a power of 2" rule of thumb). There are even multiple listings for the Java one, and the one that is faster is using it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap?!?
What we really should be asking here is: why is any language doing "poorly" in this benchmark? It just isn't surprising that Rust is competitive with C/C++, nor is it surprising that Java is also; what is surprising is that Swift, Go, Haskell, and C# are not, and so I bet the issue is something (such as "is allocating memory for a thing which is not required") that can be trivially fixed for each (though by the rules of engagement, it might... or might not :/ as Java "cheated", right? ;P... require a minor fix upstream).
I mean, since the "work" explicitly is not "write a hash table using nothing but primitives from this language", there is no particular reason why Perl and Python (which is using a non-destructive array .replace, which is likely brutal... again: I bet this is almost always a benchmark of ancillary memory allocations) should be doing as poorly as they are: if we all made "optimize for this benchmark" a top priority for a weekend hackathon, I bet we could get every open source language to nail this under 25s. But do we care?
[+] [-] igouy|9 years ago|reply
http://benchmarksgame.alioth.debian.org/play.html#contribute
[+] [-] igouy|9 years ago|reply
So thanks for that!
[+] [-] nisa|9 years ago|reply
That's just a library that provides data structures without boxing of int,double... in Java. This eliminates a lot of overhead.
[+] [-] insulanian|9 years ago|reply
[+] [-] makmanalp|9 years ago|reply
[+] [-] santaclaus|9 years ago|reply
[+] [-] sanxiyn|9 years ago|reply
https://doc.rust-lang.org/std/collections/struct.HashMap.htm...
[+] [-] geodel|9 years ago|reply
[+] [-] jeffdavis|9 years ago|reply
[+] [-] masklinn|9 years ago|reply
[+] [-] kcdev|9 years ago|reply
[+] [-] vegabook|9 years ago|reply
Also look how ginormous the binaries are for all the VM languages. Kinda would have thought it would be the opposite what with not needing to link in as much runtime?
[+] [-] snakeanus|9 years ago|reply
This was my favourite feature of the benchmarks game.
[+] [-] gigatexal|9 years ago|reply
[+] [-] infogulch|9 years ago|reply
[+] [-] Etzos|9 years ago|reply
[+] [-] scotty79|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]