That's a big jump between OCaml and Go. I'm not familiar with ray tracing, but skimming the source code it mostly looks like it's doing floating point math; it doesn't look like it's using the runtime (no allocations, no virtual function calls, no scheduling, etc), so I'm surprised that Go is performing relatively poorly.
I wonder if the performance gap is attributable to some overhead in Go's function calls? I know Go passes parameters on the stack instead of via registers... Maybe it's due to passing struct copies instead of references (looks like the C version passes references)? Generally poor code generation?
Anyone else have ideas or care to profile?
EDIT: From my 2015 MBP, Go (version 1.12) is indeed quite a lot slower than C, but only if you're doing an optimized build `-03`:
tmp $ time ./gorb
real 1m15.128s
user 1m9.366s
sys 0m6.754s
tmp $ clang crb.c
tmp $ time ./a.out
real 1m13.041s
user 1m10.284s
sys 0m0.624s
tmp $ gcc crb.c -o crb -std=c11 -O3 -lm -D_XOPEN_SOURCE=600
tmp $ time ./crb
real 0m22.703s
user 0m22.550s
sys 0m0.073s
tmp $ clang crb.c -o crb -std=c11 -O3 -lm -D_XOPEN_SOURCE=600
tmp $ time ./crb
real 0m22.689s
user 0m22.564s
sys 0m0.060s
Python gets 20% faster if you use `__slots__` on the Vector class which is created and destroyed millions of times. It's still the second-slowest, but it's a nice improvement :P
That means time spent writing the .ppm file is included.
In the implementations I browsed, that is about a million print calls, each of which might flush the output buffer, and whose performance may depend on locale.
To benchmark ray tracing I would, instead, just output the sum of the pixel values, or set the exit code depending on that value.
Even though ray tracing is cpu intensive, it also wouldn’t completely surprise me if some of the implementations in less mature languages spent significant time writing that output because their programmers haven’t come around to optimizing such code.
If I'm not mistaken, Miguel himself said that Mono was meant for portability (can run on Linux) and not performance. Would be a far better test to use .NET Core as you could still run this test on Linux or any other place where Core runs.
For some workloads Mono seems awfully slow. The compiler I'm maintaining at work takes about twice as long on Mono on Windows and about four times as long as Mono on Linux compared to NET. I guess NET Core would be comparable or faster than .NET Framework and similar on both platforms.
The C# implementation looks flawed (uses reference types for vectors etc). Using value types and .NET Core should give a much better result than that. Will try to remember doing a PR.
To be fair, The README.md seems to be three years old.
It would actually be quite interesting to see a comparison with all of the languages using more recent builds to see which ones are developing their performance.
Yeah, I've meant to update it and add more language implementations, but haven't really got the time. Might as well do so soon as almost all compilers/interpreters have new versions which I suspect have many nice optimizations.
You should see a performance boost in the Haskell implementation by compiling with GHC's LLVM backend[0]. Another Haskell ray tracer ran 30 % faster than the native codegen this way[1].
There is a big variation in performance, some of which I find surprising. Do you know what exactly causes some languages to be so slow (e.g., small objects being created and garbage collected frequently)?
Did some testing and found that it boils down to three things: RNG algorithm used in the standard library, forced use of double precision floating point numbers (the case for OCaml and javascript), and like you mentioned, memory management.
EDIT: forgot to mention the obvious things: compiler/interpreter maturity and inherent overhead.
Looking at the Julia implementation fast math wasn't used. In my experience it's usually worth experimenting with turning it on (also of course for the other LLVM based languages), though I understand that this benchmark tries to keep the program correct at all costs.
Never use `(optimize (safety 0))` in SBCL — it throws safety completely out the window. We're talking C-levels of safety at that point. Buffer overruns, the works. It might buy you 10-20% speed, but it's not worth it. Lisp responsibly, use `(safety 1)`.
(defconstant WIDTH 1280)
People generally name constants in CL with +plus-muffs+. Naming them as uppercase doesn't help because the reader uppercases symbol names by default when it reads. So `(defconstant WIDTH ...)` means you can no longer have a variable named `width` (in the same package).
(defstruct (vec
(:conc-name v-)
(:constructor v-new (x y z))
(:type (vector float)))
x y z)
Using `:type (vector float)` here is trying to make things faster, but failing. The type designator `float` covers all kinds of floats, e.g. both `single-float`s and `double-float`s in SBCL. So all SBCL knows is that the struct contains some kind of float, and it can't really do much with that information. This means all the vector math functions below have to fall back to generic arithmetic, which is extremely slow. SBCL even warns you about this when it's compiling, thanks to the `(optimize (speed 3))` declaration, but I guess they ignored or didn't understand those warnings.
(defconstant ZERO (v-new 0.0 0.0 0.0))
This will cause problems because if it's ever evaluated more than once it'll try to redefine the constant to a new `vec` instance, which will not be `eql` to the old one. Use `alexandria:define-constant` or just make it a global variable.
All the vector math functions are slow because they have no useful type information to work with:
The `:conc-name ray-` is useless, that's the default conc-name. And again with the `:type vector`… just make it a normal struct. I was going to guess that they were doing it so they could use vector literals to specify the objects, but then why are they bothering to define a BOA constructor here? And the slots are untyped, which, if you're looking for speed, is not doing you any favors.
I took a few minutes over lunch to add some type declarations to the slots and important functions, inlined the math, cleaned up the broken indentation and naming issues:
The old version runs in 5m12s on my laptop, the new version runs in 58s. So if we unscientifically extrapolate that to their 24m time, it puts it somewhere around 5m in their list. This matches what I usually see from SBCL: for numeric-heavy code generic arithmetic is very slow, and some judicious use of type declarations can get you to within ~5-10x of C. Getting more improvements beyond that can require really bonkers stuff that often isn't worth it.
[+] [-] kenhwang|7 years ago|reply
[+] [-] weberc2|7 years ago|reply
[+] [-] z92|7 years ago|reply
Last time I checked, many years back, the spec was changing and the run time did crash. Guess it has gone a long way since.
The other one is Lua. My assumption was that it's one of the lightest and fastest language around. Looks like "fastest" isn't true in some cases.
[+] [-] weberc2|7 years ago|reply
I wonder if the performance gap is attributable to some overhead in Go's function calls? I know Go passes parameters on the stack instead of via registers... Maybe it's due to passing struct copies instead of references (looks like the C version passes references)? Generally poor code generation?
Anyone else have ideas or care to profile?
EDIT: From my 2015 MBP, Go (version 1.12) is indeed quite a lot slower than C, but only if you're doing an optimized build `-03`:
EDIT2: I re-modified the Go version (https://gist.github.com/weberc2/2aed4f8d3189d09067d564448367...) to pass references and that seems to put it on par with C (or I mistranslated, which is also likely):[+] [-] Shish2k|7 years ago|reply
[+] [-] krull10|7 years ago|reply
[+] [-] Someone|7 years ago|reply
In the implementations I browsed, that is about a million print calls, each of which might flush the output buffer, and whose performance may depend on locale.
To benchmark ray tracing I would, instead, just output the sum of the pixel values, or set the exit code depending on that value.
Even though ray tracing is cpu intensive, it also wouldn’t completely surprise me if some of the implementations in less mature languages spent significant time writing that output because their programmers haven’t come around to optimizing such code.
[+] [-] weberc2|7 years ago|reply
[+] [-] piinbinary|7 years ago|reply
[+] [-] niofis|7 years ago|reply
[+] [-] kyberias|7 years ago|reply
[+] [-] mikece|7 years ago|reply
[+] [-] ygra|7 years ago|reply
[+] [-] alkonaut|7 years ago|reply
[+] [-] ilitirit|7 years ago|reply
[+] [-] Lerc|7 years ago|reply
Oftentimes compile to C is "It's C Jim, but not as we know it"
You can write C as if it is a SSA VM or similar intermediate representation that leaves very little work for the first stages of the compiler.
[+] [-] kev6168|7 years ago|reply
[+] [-] X6S1x6Okd1st|7 years ago|reply
I've certainly seen speedups like that on stuff like project euler code.
[+] [-] FartyMcFarter|7 years ago|reply
[+] [-] fnord77|7 years ago|reply
what's an ancient version of rust. Interesting it is faster than C, though.
[+] [-] Lerc|7 years ago|reply
It would actually be quite interesting to see a comparison with all of the languages using more recent builds to see which ones are developing their performance.
[+] [-] niofis|7 years ago|reply
[+] [-] mratsim|7 years ago|reply
[+] [-] JoshuaScript|7 years ago|reply
[0]https://gitlab.haskell.org/ghc/ghc/wikis/commentary/compiler...
[1]http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-ll...
[+] [-] azhenley|7 years ago|reply
There is a big variation in performance, some of which I find surprising. Do you know what exactly causes some languages to be so slow (e.g., small objects being created and garbage collected frequently)?
[+] [-] niofis|7 years ago|reply
EDIT: forgot to mention the obvious things: compiler/interpreter maturity and inherent overhead.
[+] [-] technological|7 years ago|reply
[+] [-] xiphias2|7 years ago|reply
[+] [-] stevelosh|7 years ago|reply
All the vector math functions are slow because they have no useful type information to work with:
If they had done the type declarations correctly, it would look more like this: The weirdness continues: The `:conc-name ray-` is useless, that's the default conc-name. And again with the `:type vector`… just make it a normal struct. I was going to guess that they were doing it so they could use vector literals to specify the objects, but then why are they bothering to define a BOA constructor here? And the slots are untyped, which, if you're looking for speed, is not doing you any favors.I took a few minutes over lunch to add some type declarations to the slots and important functions, inlined the math, cleaned up the broken indentation and naming issues:
https://gist.github.com/sjl/005f27274adacd12ea2fc7f0b7200b80...
The old version runs in 5m12s on my laptop, the new version runs in 58s. So if we unscientifically extrapolate that to their 24m time, it puts it somewhere around 5m in their list. This matches what I usually see from SBCL: for numeric-heavy code generic arithmetic is very slow, and some judicious use of type declarations can get you to within ~5-10x of C. Getting more improvements beyond that can require really bonkers stuff that often isn't worth it.
[+] [-] armitron|7 years ago|reply
[+] [-] PorterDuff|7 years ago|reply
...and then add SIMD.
[+] [-] omaranto|7 years ago|reply
[+] [-] armitron|7 years ago|reply
Look at Steve Losh's comment here for something a lot better. My own (further) improvements put SBCL performance in the same order as Julia.
[+] [-] iainmerrick|7 years ago|reply
Good to see a few languages like Nim and Rust actually beating C for raw performance, too.
[+] [-] wlesieutre|7 years ago|reply
[+] [-] Narishma|7 years ago|reply
[+] [-] Mac2125|7 years ago|reply
[+] [-] once-in-a-while|7 years ago|reply
[deleted]