top | item 4624525

(no title)

tabbyjabby | 13 years ago

I am extremely doubtful that optimized C is going to be equally performant as optimized JavaScript. There is innate overhead when using an interpreted language, no matter how advanced the interpreter. JavaScript is also garbage collected, while C is not, adding an additional level of overhead.

At no point in this article are we shown the code of this new parser. We are also told that is incomplete. So we have a parser which we can't see and which is not finished, but apparently dominates its C counterpart in performance. This leads to me to believe one of two things:

1. The parser isn't complete, and its unimplemented functionality is going to be more expensive in terms of performance than the author anticipated, thus rendering his preliminary results void.

2. The implementation of the C extension he is comparing against is not very well optimized. As said above, I find it very hard to believe that well optimized C is going to be beaten by well optimized JavaScript.

discuss

order

tptacek|13 years ago

It feels likely that the fundamental problem facing JS programs competing with the entire solution space available to C programs --- apart from the nerd who will implement an equivalently fast JIT compiler to run the JS on just to make a point --- is that C code has much better control over the layout of data in memory. You can work around the GC issues and modern compilers work around the interpretation overhead, but a C program is going to tend to have the easiest time ensuring that its working set fits into cache.

Of course, the real problem here is that he's comparing his JS code to a random blob of C code that just happens to be part of the MySQL package. Is the MySQL client library famously performant?

pcwalton|13 years ago

C-like control over the layout of data in memory is provided by the Binary Data proposal for ECMAScript: http://wiki.ecmascript.org/doku.php?id=harmony:binary_data&#...

I think the GC issues and codegen are the most difficult, actually. When the GC bites you you rarely have any other choice than to use free lists, which are always a pain. Furthermore, JS codegen is always complicated by the fact that JS is a JIT, and so many optimizations commonplace in C compilers tend to be skipped in the name of faster compilation. (You would be amazed how many of the common JS benchmarks end up boiling down to compilation speed.)

Despite all of this, however, I'm bullish on the future of JS with respect to performance; it may lose on head-to-head competition with C, but modern JS performance is perfectly adequate even for 3D games and the like.

(Also interestingly, Emscripten works around all of these difficulties surprisingly well -- you get low-level control over memory with it; you never GC; you get LLVM's optimizations.)

SoftwareMaven|13 years ago

I agree with you. If getting every bit of performance out of your hardware is important to you, C (possibly with inlined assembly) is the way to go.

I am so happy that I've never had to do that in my career.

vidarh|13 years ago

Last time I looked at the MySQL client library, which is admittedly a few years ago, I was horrified to find it wasted a tremendous amount of time doing tiny read()'s instead of reading into larger buffers and splitting things up client side - the former causes ridiculous extra latency due to the extra context switches - , so I'd not be the least bit surprised if it is still slow.

In fact, if the tiny reads are still there in the MySQL C client, that in itself might explain the difference.

specialist|13 years ago

> C code has much better control over the layout of data in memory.

Yes. This presentation really drives that point in.

Building Memory-Efficient Java Applications: Practices and Challenges http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/mem...

(There are plenty of other sources too.)

Our Java study group tackled this paper. To prepare for the discussion, I tried to put my XML document object model implementation on a diet. Super fun. Modest benefits.

> You can work around the GC issues and modern compilers work around > the interpretation overhead, but a C program is going to tend to > have the easiest time ensuring that its working set fits into cache.

As I understand it (h/t Joe Bowbeer), Doug Lea and others have been doing very cool work micro tuning data structures suitable for use on multicore systems. So it's possible. But definitely not trivial; certainly beyond my abilities.

eric_bullington|13 years ago

Is that a blanket statement about C versus any dynamic language, or just JavaScript? Because with LuaJIT2, Mike Pall has already shown that a highly-optimized Lua interpreter can hold its own against optimized C, and sometimes even surpass it.

Here's a nice look at some of the magic he does in assembler to get LuaJIT at such a high level of performance:

http://article.gmane.org/gmane.comp.lang.lua.general/75426

My point is, there's nothing inherently unbeatable about C. I think that a select group of dynamic language -- probably including JavaScript -- can and will reach that level of performance, or at least close to it. They just haven't been around as long.

goggles99|13 years ago

Yeah, well please tell me how? The JavaScript engine is written in C so how could the language running on top of it be faster?

I will tell you how it can be - if the JavaScript engine JITs the JS in a more optimized way than the C compiler compiled the C.

Well how does that make JS faster than C when JS will always carry more overhead? It cannot and does not - all it means is that the compiler was far better.

A poor C compiler may be outperformed by the best JS one, but that is about the only scenario where JS will outperform C.

Jare|13 years ago

From the article:

> JavaScript will no longer be the bottleneck. At this point the main cost is turning network data into JavaScript objects. This cost is equal for JavaScript libraries, as well as for C++ addons making V8 calls. Also, when a single client can process 5000 MBit/s utilizing a single CPU, your MySQL server has long exploded

Remember these are Node drivers, so everything has to end up becoming a JS object.

vidarh|13 years ago

I'm also doubtful that this will beat C, however...

> There is innate overhead when using an interpreted language, no matter how advanced the interpreter.

There is an overhead in JIT'ing the code, yes. But it is not a given that when amortised over sufficient data, that this overhead can not be compensated for by making use of knowledge that is simply not there at compile time. E.g. there are several well known methods regularly used in optimising dynamic languages that depend on discovering and adapting the code based on the actual types used. But when you're first doing this specialisation at runtime, nothing would stop you from taking actual input data into account rather than just basic type information for example.

> JavaScript is also garbage collected, while C is not, adding an additional level of overhead.

This is not necessarily a benefit for C unless a) your JavaScript program uses enough memory to trigger the garbage collector, which will depend on both the VM and your system, or b) your C program is written in a way that requires no extra effort to keep track of when memory becomes garbage. Programs written for GC'd systems may sometimes require far less logic to keep track of memory, and instead take the cost at collection time, and so may spend less time dealing with memory in situations where collection isn't necessary during the lifetime of the program.

In the general case, there's an extra overhead. In specific cases it is most certainly possible that the garbage collection can be a performance benefit by acting like a safety net that lets you do other stuff (avoiding logic to track ownership) that can speed up your app - this applies regardless of language.

Other than that I agree with you.

jrockway|13 years ago

node is compiled to native code at runtime, and the compiler uses runtime type information to generate that code. C (without LLVM's JIT, of course) is compiled to native code before runtime, and as such, cannot take advantage of runtime type information when producing the code. So in this case, node does have a theoretical advantage.

But to be thorough, one should also compile libmysql's parsing routines with LLVM and turn on runtime JIT compilation, and then write a benchmark that will exercise the code long enough for the JIT to kick in.

tptacek|13 years ago

So take the output of the profiler after running the program on a diversity of real inputs and use that to drive the optimizer.

inoop|13 years ago

Modern JavaScript runtimes are not interpreters, V8 for instance is just a JS to native ahead-of-time compiler. Also, for the general case, garbage collection has been outperforming manual management for over a decade now.

JackdawX|13 years ago

There are 3 common fallacies in this post that people continually bring up about modern languages. Thought i'd come on here and write some corrections:

"There is innate overhead when using an interpreted language, no matter how advanced the interpreter."

I think a lot of people think this is true, because they have an antequated idea of an interpreter which reads each line as a string, parses the string, then executes the code. Modern js (and jvm bytecode, similar-but-different scenario) 'interpreters' compile to native bytecode and the code runs directly on your hardware.

The speed overhead from modern language has a lot less to do with the 'interpreted' nature of the language, and a lot more to do with (a) code structure (overhead of objects, closures, namespace resolution, all that junk), and (b) the maturity of gcc vs other compilers.

Theoretically a language with a JIT compiler should actually be faster than a precompiled language in any scanario where you are intensely going round and round a loop which has a lot of logic inside it. The more the loop runs, the better the JIT compiler can optimise it based on the current runtime conditions.

"JavaScript is also garbage collected, while C is not, adding an additional level of overhead."

The overhead of the garbage collector is only really to do with memory and startup speed, not runtime speed. It does barely anything when not collecting. Every time you alloc in c, you must free, and those actions have to happen in proximity to each other. Free takes time. GC allows all those free operations to happen when the program is idle, instead of holding up your program.

The more stuff you create and delete, the faster a GC language should be compared to an equivalent program in a non-GC language.