WikiSort – Fast, stable, O(1) space merge sort algorithm

[+] _delirium|12 years ago|reply

To be pedantic, in-place sorting algorithms (typically?) take O(log n) space, not O(1). They don't copy the data to be sorted, but they do need some temporary space that isn't fixed, but scales (slowly) with the size of the array being sorted. The usual sources of the log-n space growth come from either a stack recursing on things (explicitly or implicitly) and/or the array indices.

This particular implementation (looking at the C version) uses fixed-size 'long ints' for its temporary data storage, which means it only works on arrays up to LONG_MAX elements. If you had larger arrays, your need for temporary data would grow, e.g. you could upgrade all those long ints to long long ints and accommodate arrays up to LLONG_MAX. Of course, logarithmic growth is very slow.

[+] Buge|12 years ago|reply

Here is what CLRS says about this:

The data types in the RAM model are integer and ﬂoating point (for storing real numbers). Although we typically do not concern ourselves with precision in this book, in some applications precision is crucial. We also assume a limit on the size of each word of data. For example, when working with inputs of size n, we typically assume that integers are represented by c lg n bits for some constant c >= 1. We require c >= 1 so that each word can hold the value of n, enabling us to index the individual input elements, and we restrict c to be a constant so that the word size does not grow arbitrarily. (If the word size could grow arbitrarily, we could store huge amounts of data in one word and operate on it all in constant time—clearly an unrealistic scenario.)

I like to think of the memory requirement as the number of integers required, not that the integers have to get larger. If we considered the size of the inters then traditional O(log n) memory algorithms (like the average case of quicksort) would be determined to take O((log n)^2) memory.

[+] tbingmann|12 years ago|reply

In a nutshell: to keep an index into an array of size n requires log_2(n) bits. Thus any algorithm, which keeps even just one index into the input takes Omega(log n) space. Yay for pedantic asymptotics.

[+] detrino|12 years ago|reply

Heap sort is an example of an in-place sorting algorithm that is constant space. Just because this implementation uses some kind of fixed size buffer, doesn't mean it has an upper limit for the amount of data it can sort. It is possible, but probably requires a deeper understanding of the algorithm to know for sure.

[+] sorbits|12 years ago|reply

Big-Oh notation deals with theory, and in theory the cost of a variable is fixed.

Furthermore, an array of size N is normally storing N variables (pointers) not N bits, so calculating storage required in bits relative to the input size (without a unit) is disingenuous.

[+] jmpeax|12 years ago|reply

So, is there such a thing as a useful O(1)-space algorithm? I can't think of a single one.

[+] beagle3|12 years ago|reply

Also, interesting discussion on reddit: http://www.reddit.com/r/programming/comments/20fxv8/making_a...

[+] tbingmann|12 years ago|reply

I made a video of how the algorithm works: http://youtu.be/NjcSyD7p660

To make it I needed a C++ version with iterators, which I though would be faster. But it is still about 20% slower than stable_sort for the default random input test. It probably also stays the same for other inputs.

[+] beagle3|12 years ago|reply

Previous innovation that I remember in sorting was Python's TimSort - it's just MergeSort with a few tweaks, but it's better than any other sort I've met when applied to real world data.

[+] masklinn|12 years ago|reply

> it's just MergeSort with a few tweaks

"a few tweaks" is a bit of an understatement, at a high-level it's a hybrid of insertion and merge sort (it's an insertion sort below 64 elements, and it uses insertion sorts to create sorted sub-sections of 32~64 elements before applying the main merge sort)

[+] taylorbuley|12 years ago|reply

I'm sure people will loathe me for saying this, but I'd really like to see the implemented into JavaScript.

We've got Crossfilter (https://github.com/square/crossfilter/wiki/API-Reference); however, as more data moves client-side with storage APIs like IndexedDB, I see a need for "as efficient as possible"

[+] phillmv|12 years ago|reply

1. Skimming the paper, it only matters if you need an efficient stable sort. If you just need O(1) memory you can stay with heapsort, which at least will have more reference implementations.

This lead me to look up browser sort implementations; http://stackoverflow.com/questions/234683/javascript-array-s... - it seems Moz uses mergesort and Webkit may or may not do something silly for non contiguous arrays.

So, there could be a use for it. For most applications you're about fine as it is.

2. I'm interested in hearing about applications where you're loading millions of array elements in people's browsers.

I was going to be cranky and make rude comments but I can envision people wanting to play with their data without loading it in specialized toolsets/learn R/build a DSL in $lang_of_choice.

[+] hltbra|12 years ago|reply

Crossfilter's link is broken (parenthesis and semicolon were included).

[+] dignati|12 years ago|reply

This has a really nice documentation, should make it easy to implement.

[+] danbruc|12 years ago|reply

Despite the good documentation this algorithm is complex and I bet most implementations will be incorrect.

UPDATE: I could not find the article I had initially in mind but I found this one [1] showing that even prominent implementations of simple algorithms like binary search or quicksort contain bugs more often than one expects and they may even remain unnoticed for decades.

So take this as a warning - if you implement this algorithm you will almost surely fail no matter how smart you are or how many people look at your implementation.

[1] http://googleresearch.blogspot.de/2006/06/extra-extra-read-a...

[+] TheLoneWolfling|12 years ago|reply

Is there a way that better minds than me can see to parallelize this algorithm?

[+] stokedmartin|12 years ago|reply

The merge step[0] can be parallelized -

- take the two sorted arrays A & B (assume both are of size n) and make partitions of size log n in one of them, let's say A

- Now considering there are n/logn processors, assign each partition to a processor. On each partition take the last element (l) and do a binary search to find a cut point in the other array B such that all elements in B are <= l. Cut points of two such partitions in A correspond to a partition in B which can then be sequentially merged by the processor.

Span is O(log n); Work is O(n); so parallelism is O(n/logn) Detailed information here [1]

[0] https://github.com/BonzaiThePenguin/WikiSort/blob/master/Cha...

[1] http://electures.informatik.uni-freiburg.de/portal/download/...

[+] unknown|12 years ago|reply

[deleted]

[+] sesqu|12 years ago|reply

I've been meaning to compare various in-place mergesorts, so this'll definitely make it into my bookmarks.

[+] jokoon|12 years ago|reply

why is a sort function so important ?

I mean what sorts of big data sets are you sorting that much often ?

[+] rguldener|12 years ago|reply

Believe it or not, sorting is one of the most common operations software performs. As a simple example think of how many times somebody queries something like `SELECT (...) FROM huge_table WHERE (...) ORDER BY (...)` Obviously the order by means the data needs to be (at least partially) sorted before it can be returned. To be fair that is a different case algorithmically since DB's are almost never able sort entirely in memory. But there are plenty of other examples where in memory sorting is necessary or provides advantages for later computation steps (eg. ability to cut of elements larger than a certain threshold).

[+] malkia|12 years ago|reply

for 3D graphics, you need to sort objects (meshes) by Z so that you end up with semi-accurate blending (Z-Buffer does not help there).

[+] jsonified|12 years ago|reply

nice to see even in sort() we have innovation!

[+] chrismonsanto|12 years ago|reply

If you're interested in alternative sort algorithms, you might enjoy the self-improving sort [1]. A simplified tl;dr: given inputs drawn from a particular distribution + a training phase, the result is a sort that is optimal for that particular distribution. The complexity is in terms of the entropy of the distribution, and can beat the typical worst case O(n log n) for comparison sorts.

[1]: http://www.cs.princeton.edu/~chazelle/pubs/selfimprove.pdf

[+] bmvakili|12 years ago|reply

I got widely different results there: C - 105.868545% C++ - 80.0518% Java - 61.664313124608775% Probably the optimizations there; can't easily be done in Java one; interesting nonetheless, thanks for post.

[+] pjscott|12 years ago|reply

Those benchmark ratios are not comparable across languages.

The C code compares running time with a very standard mergesort.

The C++ code compares with std::stable_sort.

The Java code compares with a standard mergesort -- very similar to the code in the C version -- but has hard-to-predict JIT warmup effects.

[+] michaelmior|12 years ago|reply

What are your result numbers supposed to mean?

[+] apples2apples|12 years ago|reply

How does it compare to an O(1) space version of quicksort?

[+] astrada|12 years ago|reply

quicksort is not a stable sort (http://en.wikipedia.org/wiki/Stable_sort#Stability)

[+] beagle3|12 years ago|reply

Quicksort is worst case O(n^2) time, unless you incorporate something like Quickselect for your pivot (which no one ever does, because it makes it relatively complicated. Have you ever seen an O(n log n) guaranteed quicksort implemented? I haven't - best I've seen is median-of-3 or median-of-5 pivots - or randomized). Furthermore, I've never seen an O(1) space version of quicksort and I'm not sure one can exist -- see, e.g. http://stackoverflow.com/questions/11455242/is-it-possible-t...

The meaningful comparison would actually be to Heapsort, which is in-place, O(1) space, and NOT stable - though much, much, simpler.

ADDED:

Anyone who uses quicksort should read this gem from Doug McIlroy, which elicits an O(n^2) behaviour from most quicksort implementations: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf -

[+] anonymoushn|12 years ago|reply

There is no O(1) space version of quicksort.

[+] zenciadam|12 years ago|reply

It's the quarterly post from the guy who just reinvented radix sort.

54 comments