Show HN: Fast.js – faster reimplementations of native JavaScript functions

[+] danabramov|11 years ago|reply

Reminds me of this comment by Petka Antonov on native V8 Promises being way slower than Bluebird[1]:

>I'd expect native browser methods to be an order of magnitude faster.

Built-ins need to adhere to ridiculous semantic complexity which only gets worse as more features get added into the language. The spec is ruthless in that it doesn't leave any case as "undefined behavior" - what happens when you use splice on an array that has an indexed getter that calls Object.observe on the array while the splice is looping?

If you implemented your own splice, then you probably wouldn't even think of supporting holed arrays, observable arrays, arrays with funky setters/getters and so on. Your splice would not behave well in these cases but that's ok because you can just document that. Additionally, since you pretty much never need the return value of splice, you can just not return anything instead of allocating a wasted array every time (you could also make this controllable from a parameter if needed).

Already with the above, you could probably reach the perf of "native splice" without even considering the fact that user code is actually optimized and compiled into native code. And when you consider that, you are way past any "native" except when it comes to special snowflakes like charCodeAt, the math functions and such.

Thirdly, built-ins are not magic that avoid doing any work, they are normal code implemented by humans. This is biggest reason for the perf difference specifically in the promise case - bluebird is extremely carefully optimized and tuned to V8 optimizing compiler expectations whereas the V8 promise implementation[2] is looking like it's directly translated from the spec pseudo-code, as in there is no optimization effort at all.

[1]: https://github.com/angular/angular.js/issues/6697#issuecomme...

[2]: https://github.com/v8/v8/blob/master/src/promise.js

[+] sheetjs|11 years ago|reply

His "optimization killers" article is also really insightful: https://github.com/petkaantonov/bluebird/wiki/Optimization-k...

[+] JoshTriplett|11 years ago|reply

> Additionally, since you pretty much never need the return value of splice, you can just not return anything instead of allocating a wasted array every time

That kind of optimization seems easily done with a builtin as well: have an option to return the array, and only do so if the JavaScript engine indicates that the calling code actually uses the return value.

[+] AshleysBrain|11 years ago|reply

This itself reminds me of similar performance issues hidden in C++. The float->int cast on x86 with Microsoft's compiler called a helper library function 'ftol'. Turns out this does loads of stuff to be spec compliant. Replacing it with a single not-spec-but-works-for-us x86 assembly instruction to convert was way faster.

So not just JS - it seems language built-ins are often slowed down by bureaucratic spec compliance, and hand-rolling code can help you get a speedup.

[+] illumen|11 years ago|reply

Specialisation in JavaScript JITs make these faster, since they don't even compile in that supporting code.

Whereas the builtin ones implemented in C++ do keep all the code in there.

This is why even full implementations written in JavaScript can be faster than C++.

[+] Joeri|11 years ago|reply

Today i was refactoring some js code that rendered a graph to svg. It was taking several seconds in some cases, and the developer had done a bunch of micro-optimizations, inlining functions, building caches to avoid nested loops, and so on.

I ran a profile. The code ran for 2.5 seconds, 4 ms of which in the micro-optimized js code, the rest updating the dom, five times all over again. Needless to say that i threw out all the micro-optimizations, halving the number of lines, and fixed it so the dom was updated once.

Anyway, the point i'm making is this: you should micro-optimize for readability and robustness, not performance, unless profiling shows it's worth it. I haven't known a case where pure (non-dom, non-xhr) js code needed micro-optimization for performance in half a decade.

[+] AshleysBrain|11 years ago|reply

Having written a major HTML5 game engine, I've ended up micro-optimizing JS code after small functions really did show up high in profiling measurements. One example: calculating a bounding box from a quad involved code along the lines of Math.min(a, b, c, d) followed by Math.max(a, b, c, d). Replacing that with a tree of ifs to determine both the minimum and maximum at once was faster and moved the bottleneck elsewhere.

[+] cel1ne|11 years ago|reply

I have done some optimizations using the Safari Debugger alongside with Chrome's, because I needed to access an iPad.

Safari counts and profiles ALL function invocation, so if you go looking for hotspots with it's profiler, you always get the parts where many functions are called. I consider this harmful.

I stumbled over this. Maybe other developers did too.

[+] unknown|11 years ago|reply

[deleted]

[+] throwaway_yy2Di|11 years ago|reply

An ignorant question -- why do people prefer to render graphics to SVG/DOM, when that's such a noticeable performance bottleneck?

[+] throwaway_yy2Di|11 years ago|reply

I think in most cases where you'd worry about JS array performance you should use actual numeric arrays [0] rather than the kitchen sink Array(). Also, I think those function abstractions have a pretty significant overhead?

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Type...

(edit): Yeah, the abstraction overhead is ridiculous. Here's the forEach() benchmark again, compared to an explicit for loop (no function calls):

    // new benchmark in bench/for-each.js    

    exports['explicit iteration'] = function() {
        acc = 0;
        for (var j=0; j<input.length; ++j) {
            acc += input[j];
        }
    }

  Native .forEach() vs fast.forEach() vs explicit iteration
    ✓  Array::forEach() x 2,101,860 ops/sec ±1.50% (79 runs sampled)
    ✓  fast.forEach() x 5,433,935 ops/sec ±1.12% (90 runs sampled)
    ✓  explicit iteration x 28,714,606 ops/sec ±1.44% (87 runs sampled)

    Winner is: explicit iteration (1266.15% faster)

(I ran this on Node "v0.11.14-pre", fresh from github).

[+] thegeomaster|11 years ago|reply

I used this in a Firefox OS app, inside an implementation of Dijkstra's algorithm and an accompanying binary heap, and while I haven't run any rigorous benchmarks, I can say the runtime felt way better on my test phone when I rewrote the algorithm to use the typed arrays.

This is very often overlooked but extremely useful for implementations of fast algorithms in JavaScript that should scale to a lot of input data.

[+] netghost|11 years ago|reply

One other micro optimization for you. Assuming the length of the array does not change in the operation, this is usually faster:

    for (var j=0, len=input.length; j < len; ++j) {

This prevents rechecking the array length on every iteration.

[+] phpnode|11 years ago|reply

regarding your edit, you're exactly right, of course a for loop will be faster. Sometimes you really do need a function call though, in which case fast forEach and map implementations become more useful.

The next step for fast.js are some sweet.js macros which will make writing for loops a bit nicer, because it's pretty painful to write this every time you want to iterate over an object:

    var keys = Object.keys(obj),
        length = keys.length,
        key, i;
    for (i = 0; i < length; i++) {
      key = keys[i];
      // ...
    }

I'd rather write:

   every key of obj {
     // ...
   }

and have that expanded at compile time.

Additionally there are some cases where you must use inline for loops (such as when slicing arguments objects, see https://github.com/petkaantonov/bluebird/wiki/Optimization-k...) and a function call is not possible. These can also be addressed with sweet.js macros.

[+] sheetjs|11 years ago|reply

Premature optimization is the root of all evil -- Knuth

V8 has excellent profiling tools (exposed in chrome and in nodejs) which should be used first before considering fallbacks. Before seeking a third party library, be sure to check if the function is called many times or is taking a long time.

For example, I found that throwing map out and using a straight array (avoiding the function calls entirely) can be up to 50% faster than using the functional suspects. But that, in the eyes of some people, unnecessarily adds complexity to the code and may not be worth changing

[+] olliej|11 years ago|reply

As a person who works on a JS engine I can say that a lot of the speed up in this library is the failure to handle holes correctly - it's surprisingly expensive to do a hole check, although there's still room in most engines to optimise it, those checks are fairly time consuming :-/

[+] mraleph|11 years ago|reply

You can version manually or automatically in the optimizing compiler inner loops of those builtins depending on the denseness / representation of the array backing store. Something along this lines: http://gist.io/7050013. Trace compiler could actually give such versioning for free automatically.

So I always putting this into not-done-yet category for JS VM related work and it is a very interesting problem to tackle.

[+] mmastrac|11 years ago|reply

This seems like a potential win for JS performance in real-world applications -- an optimization hint to indicate whether an array is "overwhelmingly full of holes" or something less sparse where more optimized versions of the functions can be used.

[+] grhmc|11 years ago|reply

Amazing, the "performance tests" here operate on a list of only ten items long: https://github.com/codemix/fast.js/blob/master/bench/for-eac...

I'm sure that is a statistically valid way to measure performance.

[+] phpnode|11 years ago|reply

you're right in that it's important to consider inputs of different sizes. I did this for the indexOf() functions, I'll do the same here, thanks!

[+] netcraft|11 years ago|reply

I wonder how these compare to the ones in lodash.

[+] jrajav|11 years ago|reply

Gotcha covered: https://news.ycombinator.com/item?id=7938062

[+] spyder|11 years ago|reply

... and to lazy.js : http://danieltao.com/lazy.js/

[+] jrajav|11 years ago|reply

Here's a jsperf of fast.js / lodash / native: http://jsperf.com/fast-vs-lodash

[+] bzbarsky|11 years ago|reply

Thanks for that.

So looking at this in Firefox, the "fast" version underperforms the native one for forEach, bind, map, reduce, and concat. It outperforms native for indexOf/lastIndexOf.

In Chrome the "fast" version does better than the native one for indexOf, lastIndexOf, forEach, map, and reduce. It does worse on bind and concat.

It's interesting to compare the actual numbers across the browsers too. In Firefox, the numbers I see for forEach are:

  "fast": 47,688 
  native: 123,187

and the Chrome ones are:

  "fast": 71,070
  native: 20,112

Similarly, for map in Firefox I see:

  "fast": 17,532 
  native: 62,268

and in Chrome:

  "fast": 38,286
  native: 19,521

Interestingly, these two methods are actually implemented in self-hosted JS in both SpiderMonkey and V8, last I checked, so this is really comparing the performance of three different JS implementations.

[+] cel1ne|11 years ago|reply

Native bind in Safari is much faster.

bind - fast 2,565,227 ±2.19%

bind - lodash 4,923,373 ±1.51%

bind - native 9,022,396 ±2.12%

on iPad this difference is even larger.

[+] dgreensp|11 years ago|reply

It's fairly well-known that Array#forEach is up to an order of magnitude slower than a for-loop, across browsers. The usual reason given is the overhead of invoking a closure from the VM. A JS implementation of forEach ought to be somewhere in the middle.

The speed-up for "concat" is surprising to me. I wonder if it holds for "splice" and if that is true across browsers.

[+] nijiko|11 years ago|reply

And it can be made even faster ;)

http://jsperf.com/fast-iter

[+] Kiro|11 years ago|reply

So the forEach magic that is so much faster is... a normal for loop:

  exports.forEach = function fastForEach (subject, fn, thisContext) {
    var length = subject.length,
        i;
    for (i = 0; i < length; i++) {
      fn.call(thisContext, subject[i], i, subject);
    }
  };

I knew that forEach was slower than a normal for loop but I was expecting something more.

[+] unknown|11 years ago|reply

[deleted]

[+] simonsarris|11 years ago|reply

Here's another one for you, one of my favorites that used to be drastic: http://jsperf.com/pipetrunc

    blah | 0; // fast!
    Math.floor(blah); // slow(er)! (except on FF nightly)

Caveat: Only works with numbers greater than -2147483649 and less than 2147483648.

[+] sheetjs|11 years ago|reply

Another caveat: for negative numbers, the bit-or rounds to zero while the floor rounds to negative infinity:

    > -0.5
    -0.5

    > (-0.5)|0
    0

    > Math.floor(-0.5)
    -1

[+] mraleph|11 years ago|reply

Caveat: if you see numbers in the range of 1,500,000,000 that means your code was thrown away by the optimizing compiler.

Microbenchmarks like that require a lot of care to measure something correctly. Check out my talk from LXJS2013 for more details: slides http://mrale.ph/talks/lxjs2013/ and video https://www.youtube.com/watch?v=65-RbBwZQdU

[+] illumen|11 years ago|reply

I did this with CPython and psyco. It turns out that writing map() in python was faster than the builtin C version. Because the JIT was allowed to do some tricks, like inlining the function.

[+] syg|11 years ago|reply

The slowness of functional methods like .map and .forEach for a time was due to their not being self-hosted. Since then, both V8 and SpiderMonkey self-host them, and bz has posted some numbers below [1].

But perf problems are more numerous still for these functional methods, because compilers in general have trouble inlining closures, especially for very polymorphic callsites like calls to the callback passed in via .map or .forEach. For an account of what's going on in SpiderMonkey, I wrote an explanation about a year ago [2]. Unfortunately, the problems still persist today.

[1] https://news.ycombinator.com/item?id=7938101 [2] http://rfrn.org/~shu/2013/03/20/two-reasons-functional-style...

[+] VeejayRampay|11 years ago|reply

Javascript. The language where you can reimplement basic functions such as map, each, reduce (which by the way are still available for objects in 2014) and have them be faster than their native counterparts.

It might be that I don't particularly like the language. but it's kind of frightening that we're building the world on that stuff.

[+] phpnode|11 years ago|reply

An alternative interpretation might be - JavaScript, the high level language language that's so awesome that it can out perform native code*

* as long as you make it do less work

[+] jaredmcateer|11 years ago|reply

That's not really fair, the author of this implementation is non-complainant with the spec by dropping edge cases in order to achieve these improvements.

[+] thegeomaster|11 years ago|reply

The number of uses we put Javascript to is indeed frightening, given its "fragile" nature and heavy criticism it attracts every now and then.

There is a great, amusing, borderline sci-fi talk by Gary Bernhardt about the future of Javascript and traditional languages compiled to Javascript. My recommendations: https://www.destroyallsoftware.com/talks/the-birth-and-death...

[+] mnemonik|11 years ago|reply

A lot of it has to do with crossing boundaries between C++ and the JIT'd code and losing things like inlining. Doesn't really have to do with JS warts.

This would likely get even faster if you manually specialized a map/forEach/etc for each caller, but then you might as well just write the loop out by hand: http://rfrn.org/~shu/2013/03/20/two-reasons-functional-style...

[+] ulisesrmzroche|11 years ago|reply

Pfft. It's not as bad as you imagine.

[+] nathanb|11 years ago|reply

I'm a bit concerned about this...

On one hand, I'm a big proponent of "know your tools". I'll gladly use a fast sort function that falls apart when sorting arrays near UINT32_MAX size if I'm aware of that caveat ahead of time and I use it only in domains where the size of the array is logically limited to something much less than that, for example.

But on the other hand, I write operating system code in C. I need to know that the library functions I call are going to protect me against edge cases so I don't inadvertently introduce security holes or attack vectors.

If I know that some JS I'm interacting with is using fast.js, maybe there will be some input I can craft in order to force the system into a 1% edge case.

The lesson here is probably "don't use this for your shopping cart", but we need to be careful deriding Javascript's builtins for being slow when really they're just being safe.

[+] phpnode|11 years ago|reply

This is actually unlikely to be a problem. The edge cases ignored here are actually the same as those ignored by underscore.js, which is obviously a very popular library.

[+] tekacs|11 years ago|reply

I think by all means use this for a shopping cart - it's namespaced by require (say under 'fast' as in the README) - you are, for all intents and purposes, calling just some library which just happens to reimplement JS builtins as its mode of operation.

It may even be _easier_ to determine the behaviour of these functions than builtins, since determining how fast works just means reading its source whereas to determine builtins' behaviour you must read the ECMAScript specification and then hope[1] that the browser actually matches it perfectly!

[1]: or read the JS engine source, but that's a whole lot more work. :P

[+] VMG|11 years ago|reply

At least a list of the corners that are cut here would be nice.

[+] MokiD|11 years ago|reply

A while back I was programming for a naive but well-written JS interpreter for set top box apps, where map was generally being avoided because of performance.

I wrote quite a fast "map" (along with the others) that looked a bit like:

  exports.map = function fastMap (subject, fn, thisContext) {
    var i = subject.length,
        result = new Array(i);
    if (thisContext) {
      while (i--) {
        result[i] = fn.call(thisContext, subject[i], i, subject);
      }
    } else {
      while (i--) {
        result[i] = fn(subject[i], i, subject);
      }
    }
    return result;
  };

I'm not sure if I just used "result = []", but on modern browsers I think that'd be recommended. But yeah, if you're programming for a web browser then using another impl of map is probably going to be a waste of time.

[+] aikah|11 years ago|reply

Some V8 people here ? how can a JS re-implementation be faster than the native implementation of a function ?

[+] cordite|11 years ago|reply

I'm interested in what those edge cases are, to say it works in 99% of the cases but provide no caveats makes me think that I might be surprised by something if I use it.

[+] seanewest|11 years ago|reply

I think this shows that standards bodies could implement less native language functionality and let community-made libraries/tools compete for those areas. ES6 goes so far as to implement a native module system, seriously calling into question any effort by the community at large to implement a competing system (e.g. browserify, requirejs).

[+] franze|11 years ago|reply

micro-benchmarks are the root of all evil

http://www.youtube.com/watch?v=65-RbBwZQdU Vyacheslav Egorov - LXJS 2013 talk

i don't know if he actually said these word, but it was the overall theme of this (very entertaining and very enlightening) talk.

162 comments