Reminds me of this comment by Petka Antonov on native V8 Promises being way slower than Bluebird[1]:
>I'd expect native browser methods to be an order of magnitude faster.
Built-ins need to adhere to ridiculous semantic complexity which only gets worse as more features get added into the language. The spec is ruthless in that it doesn't leave any case as "undefined behavior" - what happens when you use splice on an array that has an indexed getter that calls Object.observe on the array while the splice is looping?
If you implemented your own splice, then you probably wouldn't even think of supporting holed arrays, observable arrays, arrays with funky setters/getters and so on. Your splice would not behave well in these cases but that's ok because you can just document that. Additionally, since you pretty much never need the return value of splice, you can just not return anything instead of allocating a wasted array every time (you could also make this controllable from a parameter if needed).
Already with the above, you could probably reach the perf of "native splice" without even considering the fact that user code is actually optimized and compiled into native code. And when you consider that, you are way past any "native" except when it comes to special snowflakes like charCodeAt, the math functions and such.
Thirdly, built-ins are not magic that avoid doing any work, they are normal code implemented by humans. This is biggest reason for the perf difference specifically in the promise case - bluebird is extremely carefully optimized and tuned to V8 optimizing compiler expectations whereas the V8 promise implementation[2] is looking like it's directly translated from the spec pseudo-code, as in there is no optimization effort at all.
> Additionally, since you pretty much never need the return value of splice, you can just not return anything instead of allocating a wasted array every time
That kind of optimization seems easily done with a builtin as well: have an option to return the array, and only do so if the JavaScript engine indicates that the calling code actually uses the return value.
This itself reminds me of similar performance issues hidden in C++. The float->int cast on x86 with Microsoft's compiler called a helper library function 'ftol'. Turns out this does loads of stuff to be spec compliant. Replacing it with a single not-spec-but-works-for-us x86 assembly instruction to convert was way faster.
So not just JS - it seems language built-ins are often slowed down by bureaucratic spec compliance, and hand-rolling code can help you get a speedup.
Today i was refactoring some js code that rendered a graph to svg. It was taking several seconds in some cases, and the developer had done a bunch of micro-optimizations, inlining functions, building caches to avoid nested loops, and so on.
I ran a profile. The code ran for 2.5 seconds, 4 ms of which in the micro-optimized js code, the rest updating the dom, five times all over again. Needless to say that i threw out all the micro-optimizations, halving the number of lines, and fixed it so the dom was updated once.
Anyway, the point i'm making is this: you should micro-optimize for readability and robustness, not performance, unless profiling shows it's worth it. I haven't known a case where pure (non-dom, non-xhr) js code needed micro-optimization for performance in half a decade.
Having written a major HTML5 game engine, I've ended up micro-optimizing JS code after small functions really did show up high in profiling measurements. One example: calculating a bounding box from a quad involved code along the lines of Math.min(a, b, c, d) followed by Math.max(a, b, c, d). Replacing that with a tree of ifs to determine both the minimum and maximum at once was faster and moved the bottleneck elsewhere.
I have done some optimizations using the Safari Debugger alongside with Chrome's, because I needed to access an iPad.
Safari counts and profiles ALL function invocation, so if you go looking for hotspots with it's profiler, you always get the parts where many functions are called. I consider this harmful.
I stumbled over this. Maybe other developers did too.
I think in most cases where you'd worry about JS array performance you should use actual numeric arrays [0] rather than the kitchen sink Array(). Also, I think those function abstractions have a pretty significant overhead?
I used this in a Firefox OS app, inside an implementation of Dijkstra's algorithm and an accompanying binary heap, and while I haven't run any rigorous benchmarks, I can say the runtime felt way better on my test phone when I rewrote the algorithm to use the typed arrays.
This is very often overlooked but extremely useful for implementations of fast algorithms in JavaScript that should scale to a lot of input data.
regarding your edit, you're exactly right, of course a for loop will be faster. Sometimes you really do need a function call though, in which case fast forEach and map implementations become more useful.
The next step for fast.js are some sweet.js macros which will make writing for loops a bit nicer, because it's pretty painful to write this every time you want to iterate over an object:
var keys = Object.keys(obj),
length = keys.length,
key, i;
for (i = 0; i < length; i++) {
key = keys[i];
// ...
}
I'd rather write:
every key of obj {
// ...
}
and have that expanded at compile time.
Additionally there are some cases where you must use inline for loops (such as when slicing arguments objects, see https://github.com/petkaantonov/bluebird/wiki/Optimization-k...) and a function call is not possible. These can also be addressed with sweet.js macros.
Premature optimization is the root of all evil -- Knuth
V8 has excellent profiling tools (exposed in chrome and in nodejs) which should be used first before considering fallbacks. Before seeking a third party library, be sure to check if the function is called many times or is taking a long time.
For example, I found that throwing map out and using a straight array (avoiding the function calls entirely) can be up to 50% faster than using the functional suspects. But that, in the eyes of some people, unnecessarily adds complexity to the code and may not be worth changing
As a person who works on a JS engine I can say that a lot of the speed up in this library is the failure to handle holes correctly - it's surprisingly expensive to do a hole check, although there's still room in most engines to optimise it, those checks are fairly time consuming :-/
You can version manually or automatically in the optimizing compiler inner loops of those builtins depending on the denseness / representation of the array backing store. Something along this lines: http://gist.io/7050013. Trace compiler could actually give such versioning for free automatically.
So I always putting this into not-done-yet category for JS VM related work and it is a very interesting problem to tackle.
This seems like a potential win for JS performance in real-world applications -- an optimization hint to indicate whether an array is "overwhelmingly full of holes" or something less sparse where more optimized versions of the functions can be used.
So looking at this in Firefox, the "fast" version underperforms the native one for forEach, bind, map, reduce, and concat. It outperforms native for indexOf/lastIndexOf.
In Chrome the "fast" version does better than the native one for indexOf, lastIndexOf, forEach, map, and reduce. It does worse on bind and concat.
It's interesting to compare the actual numbers across the browsers too. In Firefox, the numbers I see for forEach are:
"fast": 47,688
native: 123,187
and the Chrome ones are:
"fast": 71,070
native: 20,112
Similarly, for map in Firefox I see:
"fast": 17,532
native: 62,268
and in Chrome:
"fast": 38,286
native: 19,521
Interestingly, these two methods are actually implemented in self-hosted JS in both SpiderMonkey and V8, last I checked, so this is really comparing the performance of three different JS implementations.
It's fairly well-known that Array#forEach is up to an order of magnitude slower than a for-loop, across browsers. The usual reason given is the overhead of invoking a closure from the VM. A JS implementation of forEach ought to be somewhere in the middle.
The speed-up for "concat" is surprising to me. I wonder if it holds for "splice" and if that is true across browsers.
So the forEach magic that is so much faster is... a normal for loop:
exports.forEach = function fastForEach (subject, fn, thisContext) {
var length = subject.length,
i;
for (i = 0; i < length; i++) {
fn.call(thisContext, subject[i], i, subject);
}
};
I knew that forEach was slower than a normal for loop but I was expecting something more.
I did this with CPython and psyco. It turns out that writing map() in python was faster than the builtin C version. Because the JIT was allowed to do some tricks, like inlining the function.
The slowness of functional methods like .map and .forEach for a time was due to their not being self-hosted. Since then, both V8 and SpiderMonkey self-host them, and bz has posted some numbers below [1].
But perf problems are more numerous still for these functional methods, because compilers in general have trouble inlining closures, especially for very polymorphic callsites like calls to the callback passed in via .map or .forEach. For an account of what's going on in SpiderMonkey, I wrote an explanation about a year ago [2]. Unfortunately, the problems still persist today.
Javascript. The language where you can reimplement basic functions such as map, each, reduce (which by the way are still available for objects in 2014) and have them be faster than their native counterparts.
It might be that I don't particularly like the language. but it's kind of frightening that we're building the world on that stuff.
That's not really fair, the author of this implementation is non-complainant with the spec by dropping edge cases in order to achieve these improvements.
A lot of it has to do with crossing boundaries between C++ and the JIT'd code and losing things like inlining. Doesn't really have to do with JS warts.
On one hand, I'm a big proponent of "know your tools". I'll gladly use a fast sort function that falls apart when sorting arrays near UINT32_MAX size if I'm aware of that caveat ahead of time and I use it only in domains where the size of the array is logically limited to something much less than that, for example.
But on the other hand, I write operating system code in C. I need to know that the library functions I call are going to protect me against edge cases so I don't inadvertently introduce security holes or attack vectors.
If I know that some JS I'm interacting with is using fast.js, maybe there will be some input I can craft in order to force the system into a 1% edge case.
The lesson here is probably "don't use this for your shopping cart", but we need to be careful deriding Javascript's builtins for being slow when really they're just being safe.
This is actually unlikely to be a problem. The edge cases ignored here are actually the same as those ignored by underscore.js, which is obviously a very popular library.
I think by all means use this for a shopping cart - it's namespaced by require (say under 'fast' as in the README) - you are, for all intents and purposes, calling just some library which just happens to reimplement JS builtins as its mode of operation.
It may even be _easier_ to determine the behaviour of these functions than builtins, since determining how fast works just means reading its source whereas to determine builtins' behaviour you must read the ECMAScript specification and then hope[1] that the browser actually matches it perfectly!
[1]: or read the JS engine source, but that's a whole lot more work. :P
A while back I was programming for a naive but well-written JS interpreter for set top box apps, where map was generally being avoided because of performance.
I wrote quite a fast "map" (along with the others) that looked a bit like:
exports.map = function fastMap (subject, fn, thisContext) {
var i = subject.length,
result = new Array(i);
if (thisContext) {
while (i--) {
result[i] = fn.call(thisContext, subject[i], i, subject);
}
} else {
while (i--) {
result[i] = fn(subject[i], i, subject);
}
}
return result;
};
I'm not sure if I just used "result = []", but on modern browsers I think that'd be recommended. But yeah, if you're programming for a web browser then using another impl of map is probably going to be a waste of time.
I'm interested in what those edge cases are, to say it works in 99% of the cases but provide no caveats makes me think that I might be surprised by something if I use it.
I think this shows that standards bodies could implement less native language functionality and let community-made libraries/tools compete for those areas. ES6 goes so far as to implement a native module system, seriously calling into question any effort by the community at large to implement a competing system (e.g. browserify, requirejs).
[+] [-] danabramov|11 years ago|reply
>I'd expect native browser methods to be an order of magnitude faster.
Built-ins need to adhere to ridiculous semantic complexity which only gets worse as more features get added into the language. The spec is ruthless in that it doesn't leave any case as "undefined behavior" - what happens when you use splice on an array that has an indexed getter that calls Object.observe on the array while the splice is looping?
If you implemented your own splice, then you probably wouldn't even think of supporting holed arrays, observable arrays, arrays with funky setters/getters and so on. Your splice would not behave well in these cases but that's ok because you can just document that. Additionally, since you pretty much never need the return value of splice, you can just not return anything instead of allocating a wasted array every time (you could also make this controllable from a parameter if needed).
Already with the above, you could probably reach the perf of "native splice" without even considering the fact that user code is actually optimized and compiled into native code. And when you consider that, you are way past any "native" except when it comes to special snowflakes like charCodeAt, the math functions and such.
Thirdly, built-ins are not magic that avoid doing any work, they are normal code implemented by humans. This is biggest reason for the perf difference specifically in the promise case - bluebird is extremely carefully optimized and tuned to V8 optimizing compiler expectations whereas the V8 promise implementation[2] is looking like it's directly translated from the spec pseudo-code, as in there is no optimization effort at all.
[1]: https://github.com/angular/angular.js/issues/6697#issuecomme...
[2]: https://github.com/v8/v8/blob/master/src/promise.js
[+] [-] sheetjs|11 years ago|reply
[+] [-] JoshTriplett|11 years ago|reply
That kind of optimization seems easily done with a builtin as well: have an option to return the array, and only do so if the JavaScript engine indicates that the calling code actually uses the return value.
[+] [-] AshleysBrain|11 years ago|reply
So not just JS - it seems language built-ins are often slowed down by bureaucratic spec compliance, and hand-rolling code can help you get a speedup.
[+] [-] illumen|11 years ago|reply
Whereas the builtin ones implemented in C++ do keep all the code in there.
This is why even full implementations written in JavaScript can be faster than C++.
[+] [-] Joeri|11 years ago|reply
I ran a profile. The code ran for 2.5 seconds, 4 ms of which in the micro-optimized js code, the rest updating the dom, five times all over again. Needless to say that i threw out all the micro-optimizations, halving the number of lines, and fixed it so the dom was updated once.
Anyway, the point i'm making is this: you should micro-optimize for readability and robustness, not performance, unless profiling shows it's worth it. I haven't known a case where pure (non-dom, non-xhr) js code needed micro-optimization for performance in half a decade.
[+] [-] AshleysBrain|11 years ago|reply
[+] [-] cel1ne|11 years ago|reply
Safari counts and profiles ALL function invocation, so if you go looking for hotspots with it's profiler, you always get the parts where many functions are called. I consider this harmful.
I stumbled over this. Maybe other developers did too.
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] throwaway_yy2Di|11 years ago|reply
[+] [-] throwaway_yy2Di|11 years ago|reply
[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Type...
(edit): Yeah, the abstraction overhead is ridiculous. Here's the forEach() benchmark again, compared to an explicit for loop (no function calls):
(I ran this on Node "v0.11.14-pre", fresh from github).[+] [-] thegeomaster|11 years ago|reply
This is very often overlooked but extremely useful for implementations of fast algorithms in JavaScript that should scale to a lot of input data.
[+] [-] netghost|11 years ago|reply
[+] [-] phpnode|11 years ago|reply
The next step for fast.js are some sweet.js macros which will make writing for loops a bit nicer, because it's pretty painful to write this every time you want to iterate over an object:
I'd rather write: and have that expanded at compile time.Additionally there are some cases where you must use inline for loops (such as when slicing arguments objects, see https://github.com/petkaantonov/bluebird/wiki/Optimization-k...) and a function call is not possible. These can also be addressed with sweet.js macros.
[+] [-] sheetjs|11 years ago|reply
V8 has excellent profiling tools (exposed in chrome and in nodejs) which should be used first before considering fallbacks. Before seeking a third party library, be sure to check if the function is called many times or is taking a long time.
For example, I found that throwing map out and using a straight array (avoiding the function calls entirely) can be up to 50% faster than using the functional suspects. But that, in the eyes of some people, unnecessarily adds complexity to the code and may not be worth changing
[+] [-] olliej|11 years ago|reply
[+] [-] mraleph|11 years ago|reply
So I always putting this into not-done-yet category for JS VM related work and it is a very interesting problem to tackle.
[+] [-] mmastrac|11 years ago|reply
[+] [-] grhmc|11 years ago|reply
I'm sure that is a statistically valid way to measure performance.
[+] [-] phpnode|11 years ago|reply
[+] [-] netcraft|11 years ago|reply
[+] [-] jrajav|11 years ago|reply
[+] [-] spyder|11 years ago|reply
[+] [-] jrajav|11 years ago|reply
[+] [-] bzbarsky|11 years ago|reply
So looking at this in Firefox, the "fast" version underperforms the native one for forEach, bind, map, reduce, and concat. It outperforms native for indexOf/lastIndexOf.
In Chrome the "fast" version does better than the native one for indexOf, lastIndexOf, forEach, map, and reduce. It does worse on bind and concat.
It's interesting to compare the actual numbers across the browsers too. In Firefox, the numbers I see for forEach are:
and the Chrome ones are: Similarly, for map in Firefox I see: and in Chrome: Interestingly, these two methods are actually implemented in self-hosted JS in both SpiderMonkey and V8, last I checked, so this is really comparing the performance of three different JS implementations.[+] [-] cel1ne|11 years ago|reply
bind - fast 2,565,227 ±2.19%
bind - lodash 4,923,373 ±1.51%
bind - native 9,022,396 ±2.12%
on iPad this difference is even larger.
[+] [-] dgreensp|11 years ago|reply
The speed-up for "concat" is surprising to me. I wonder if it holds for "splice" and if that is true across browsers.
[+] [-] nijiko|11 years ago|reply
http://jsperf.com/fast-iter
[+] [-] Kiro|11 years ago|reply
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] simonsarris|11 years ago|reply
[+] [-] sheetjs|11 years ago|reply
[+] [-] mraleph|11 years ago|reply
Microbenchmarks like that require a lot of care to measure something correctly. Check out my talk from LXJS2013 for more details: slides http://mrale.ph/talks/lxjs2013/ and video https://www.youtube.com/watch?v=65-RbBwZQdU
[+] [-] illumen|11 years ago|reply
[+] [-] syg|11 years ago|reply
But perf problems are more numerous still for these functional methods, because compilers in general have trouble inlining closures, especially for very polymorphic callsites like calls to the callback passed in via .map or .forEach. For an account of what's going on in SpiderMonkey, I wrote an explanation about a year ago [2]. Unfortunately, the problems still persist today.
[1] https://news.ycombinator.com/item?id=7938101 [2] http://rfrn.org/~shu/2013/03/20/two-reasons-functional-style...
[+] [-] VeejayRampay|11 years ago|reply
It might be that I don't particularly like the language. but it's kind of frightening that we're building the world on that stuff.
[+] [-] phpnode|11 years ago|reply
* as long as you make it do less work
[+] [-] jaredmcateer|11 years ago|reply
[+] [-] thegeomaster|11 years ago|reply
There is a great, amusing, borderline sci-fi talk by Gary Bernhardt about the future of Javascript and traditional languages compiled to Javascript. My recommendations: https://www.destroyallsoftware.com/talks/the-birth-and-death...
[+] [-] mnemonik|11 years ago|reply
This would likely get even faster if you manually specialized a map/forEach/etc for each caller, but then you might as well just write the loop out by hand: http://rfrn.org/~shu/2013/03/20/two-reasons-functional-style...
[+] [-] ulisesrmzroche|11 years ago|reply
[+] [-] nathanb|11 years ago|reply
On one hand, I'm a big proponent of "know your tools". I'll gladly use a fast sort function that falls apart when sorting arrays near UINT32_MAX size if I'm aware of that caveat ahead of time and I use it only in domains where the size of the array is logically limited to something much less than that, for example.
But on the other hand, I write operating system code in C. I need to know that the library functions I call are going to protect me against edge cases so I don't inadvertently introduce security holes or attack vectors.
If I know that some JS I'm interacting with is using fast.js, maybe there will be some input I can craft in order to force the system into a 1% edge case.
The lesson here is probably "don't use this for your shopping cart", but we need to be careful deriding Javascript's builtins for being slow when really they're just being safe.
[+] [-] phpnode|11 years ago|reply
[+] [-] tekacs|11 years ago|reply
It may even be _easier_ to determine the behaviour of these functions than builtins, since determining how fast works just means reading its source whereas to determine builtins' behaviour you must read the ECMAScript specification and then hope[1] that the browser actually matches it perfectly!
[1]: or read the JS engine source, but that's a whole lot more work. :P
[+] [-] VMG|11 years ago|reply
[+] [-] MokiD|11 years ago|reply
I wrote quite a fast "map" (along with the others) that looked a bit like:
I'm not sure if I just used "result = []", but on modern browsers I think that'd be recommended. But yeah, if you're programming for a web browser then using another impl of map is probably going to be a waste of time.[+] [-] aikah|11 years ago|reply
[+] [-] cordite|11 years ago|reply
[+] [-] seanewest|11 years ago|reply
[+] [-] franze|11 years ago|reply
http://www.youtube.com/watch?v=65-RbBwZQdU Vyacheslav Egorov - LXJS 2013 talk
i don't know if he actually said these word, but it was the overall theme of this (very entertaining and very enlightening) talk.