(no title)
tabbyjabby | 13 years ago
At no point in this article are we shown the code of this new parser. We are also told that is incomplete. So we have a parser which we can't see and which is not finished, but apparently dominates its C counterpart in performance. This leads to me to believe one of two things:
1. The parser isn't complete, and its unimplemented functionality is going to be more expensive in terms of performance than the author anticipated, thus rendering his preliminary results void.
2. The implementation of the C extension he is comparing against is not very well optimized. As said above, I find it very hard to believe that well optimized C is going to be beaten by well optimized JavaScript.
tptacek|13 years ago
Of course, the real problem here is that he's comparing his JS code to a random blob of C code that just happens to be part of the MySQL package. Is the MySQL client library famously performant?
pcwalton|13 years ago
I think the GC issues and codegen are the most difficult, actually. When the GC bites you you rarely have any other choice than to use free lists, which are always a pain. Furthermore, JS codegen is always complicated by the fact that JS is a JIT, and so many optimizations commonplace in C compilers tend to be skipped in the name of faster compilation. (You would be amazed how many of the common JS benchmarks end up boiling down to compilation speed.)
Despite all of this, however, I'm bullish on the future of JS with respect to performance; it may lose on head-to-head competition with C, but modern JS performance is perfectly adequate even for 3D games and the like.
(Also interestingly, Emscripten works around all of these difficulties surprisingly well -- you get low-level control over memory with it; you never GC; you get LLVM's optimizations.)
SoftwareMaven|13 years ago
I am so happy that I've never had to do that in my career.
vidarh|13 years ago
In fact, if the tiny reads are still there in the MySQL C client, that in itself might explain the difference.
specialist|13 years ago
Yes. This presentation really drives that point in.
Building Memory-Efficient Java Applications: Practices and Challenges http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/mem...
(There are plenty of other sources too.)
Our Java study group tackled this paper. To prepare for the discussion, I tried to put my XML document object model implementation on a diet. Super fun. Modest benefits.
> You can work around the GC issues and modern compilers work around > the interpretation overhead, but a C program is going to tend to > have the easiest time ensuring that its working set fits into cache.
As I understand it (h/t Joe Bowbeer), Doug Lea and others have been doing very cool work micro tuning data structures suitable for use on multicore systems. So it's possible. But definitely not trivial; certainly beyond my abilities.
eric_bullington|13 years ago
Here's a nice look at some of the magic he does in assembler to get LuaJIT at such a high level of performance:
http://article.gmane.org/gmane.comp.lang.lua.general/75426
My point is, there's nothing inherently unbeatable about C. I think that a select group of dynamic language -- probably including JavaScript -- can and will reach that level of performance, or at least close to it. They just haven't been around as long.
goggles99|13 years ago
I will tell you how it can be - if the JavaScript engine JITs the JS in a more optimized way than the C compiler compiled the C.
Well how does that make JS faster than C when JS will always carry more overhead? It cannot and does not - all it means is that the compiler was far better.
A poor C compiler may be outperformed by the best JS one, but that is about the only scenario where JS will outperform C.
Jare|13 years ago
> JavaScript will no longer be the bottleneck. At this point the main cost is turning network data into JavaScript objects. This cost is equal for JavaScript libraries, as well as for C++ addons making V8 calls. Also, when a single client can process 5000 MBit/s utilizing a single CPU, your MySQL server has long exploded
Remember these are Node drivers, so everything has to end up becoming a JS object.
vidarh|13 years ago
> There is innate overhead when using an interpreted language, no matter how advanced the interpreter.
There is an overhead in JIT'ing the code, yes. But it is not a given that when amortised over sufficient data, that this overhead can not be compensated for by making use of knowledge that is simply not there at compile time. E.g. there are several well known methods regularly used in optimising dynamic languages that depend on discovering and adapting the code based on the actual types used. But when you're first doing this specialisation at runtime, nothing would stop you from taking actual input data into account rather than just basic type information for example.
> JavaScript is also garbage collected, while C is not, adding an additional level of overhead.
This is not necessarily a benefit for C unless a) your JavaScript program uses enough memory to trigger the garbage collector, which will depend on both the VM and your system, or b) your C program is written in a way that requires no extra effort to keep track of when memory becomes garbage. Programs written for GC'd systems may sometimes require far less logic to keep track of memory, and instead take the cost at collection time, and so may spend less time dealing with memory in situations where collection isn't necessary during the lifetime of the program.
In the general case, there's an extra overhead. In specific cases it is most certainly possible that the garbage collection can be a performance benefit by acting like a safety net that lets you do other stuff (avoiding logic to track ownership) that can speed up your app - this applies regardless of language.
Other than that I agree with you.
jrockway|13 years ago
But to be thorough, one should also compile libmysql's parsing routines with LLVM and turn on runtime JIT compilation, and then write a benchmark that will exercise the code long enough for the JIT to kick in.
tptacek|13 years ago
inoop|13 years ago
JackdawX|13 years ago
"There is innate overhead when using an interpreted language, no matter how advanced the interpreter."
I think a lot of people think this is true, because they have an antequated idea of an interpreter which reads each line as a string, parses the string, then executes the code. Modern js (and jvm bytecode, similar-but-different scenario) 'interpreters' compile to native bytecode and the code runs directly on your hardware.
The speed overhead from modern language has a lot less to do with the 'interpreted' nature of the language, and a lot more to do with (a) code structure (overhead of objects, closures, namespace resolution, all that junk), and (b) the maturity of gcc vs other compilers.
Theoretically a language with a JIT compiler should actually be faster than a precompiled language in any scanario where you are intensely going round and round a loop which has a lot of logic inside it. The more the loop runs, the better the JIT compiler can optimise it based on the current runtime conditions.
"JavaScript is also garbage collected, while C is not, adding an additional level of overhead."
The overhead of the garbage collector is only really to do with memory and startup speed, not runtime speed. It does barely anything when not collecting. Every time you alloc in c, you must free, and those actions have to happen in proximity to each other. Free takes time. GC allows all those free operations to happen when the program is idle, instead of holding up your program.
The more stuff you create and delete, the faster a GC language should be compared to an equivalent program in a non-GC language.