To be pedantic, in-place sorting algorithms (typically?) take O(log n) space, not O(1). They don't copy the data to be sorted, but they do need some temporary space that isn't fixed, but scales (slowly) with the size of the array being sorted. The usual sources of the log-n space growth come from either a stack recursing on things (explicitly or implicitly) and/or the array indices.
This particular implementation (looking at the C version) uses fixed-size 'long ints' for its temporary data storage, which means it only works on arrays up to LONG_MAX elements. If you had larger arrays, your need for temporary data would grow, e.g. you could upgrade all those long ints to long long ints and accommodate arrays up to LLONG_MAX. Of course, logarithmic growth is very slow.
The data types in the RAM model are integer and floating point (for storing real numbers). Although we typically do not concern ourselves with precision in this book, in some applications precision is crucial. We also assume a limit on the size of each word of data. For example, when working with inputs of size n, we typically assume that integers are represented by c lg n bits for some constant c >= 1. We require c >= 1 so that each word can hold the value of n, enabling us to index the individual input elements, and we restrict c to be a constant so that the word size does not grow arbitrarily. (If the word size could grow arbitrarily, we could store huge amounts of data in one word and operate on it all in constant time—clearly an unrealistic scenario.)
I like to think of the memory requirement as the number of integers required, not that the integers have to get larger. If we considered the size of the inters then traditional O(log n) memory algorithms (like the average case of quicksort) would be determined to take O((log n)^2) memory.
In a nutshell: to keep an index into an array of size n requires log_2(n) bits. Thus any algorithm, which keeps even just one index into the input takes Omega(log n) space. Yay for pedantic asymptotics.
Heap sort is an example of an in-place sorting algorithm that is constant space. Just because this implementation uses some kind of fixed size buffer, doesn't mean it has an upper limit for the amount of data it can sort. It is possible, but probably requires a deeper understanding of the algorithm to know for sure.
Big-Oh notation deals with theory, and in theory the cost of a variable is fixed.
Furthermore, an array of size N is normally storing N variables (pointers) not N bits, so calculating storage required in bits relative to the input size (without a unit) is disingenuous.
To make it I needed a C++ version with iterators, which I though would be faster. But it is still about 20% slower than stable_sort for the default random input test. It probably also stays the same for other inputs.
Previous innovation that I remember in sorting was Python's TimSort - it's just MergeSort with a few tweaks, but it's better than any other sort I've met when applied to real world data.
"a few tweaks" is a bit of an understatement, at a high-level it's a hybrid of insertion and merge sort (it's an insertion sort below 64 elements, and it uses insertion sorts to create sorted sub-sections of 32~64 elements before applying the main merge sort)
1. Skimming the paper, it only matters if you need an efficient stable sort. If you just need O(1) memory you can stay with heapsort, which at least will have more reference implementations.
So, there could be a use for it. For most applications you're about fine as it is.
2. I'm interested in hearing about applications where you're loading millions of array elements in people's browsers.
I was going to be cranky and make rude comments but I can envision people wanting to play with their data without loading it in specialized toolsets/learn R/build a DSL in $lang_of_choice.
Despite the good documentation this algorithm is complex and I bet most implementations will be incorrect.
UPDATE: I could not find the article I had initially in mind but I found this one [1] showing that even prominent implementations of simple algorithms like binary search or quicksort contain bugs more often than one expects and they may even remain unnoticed for decades.
So take this as a warning - if you implement this algorithm you will almost surely fail no matter how smart you are or how many people look at your implementation.
- take the two sorted arrays A & B (assume both are of size n) and make partitions of size log n in one of them, let's say A
- Now considering there are n/logn processors, assign each partition to a processor. On each partition take the last element (l) and do a binary search to find a cut point in the other array B such that all elements in B are <= l. Cut points of two such partitions in A correspond to a partition in B which can then be sequentially merged by the processor.
Span is O(log n); Work is O(n); so parallelism is O(n/logn)
Detailed information here [1]
Believe it or not, sorting is one of the most common operations software performs. As a simple example think of how many times somebody queries something like `SELECT (...) FROM huge_table WHERE (...) ORDER BY (...)` Obviously the order by means the data needs to be (at least partially) sorted before it can be returned. To be fair that is a different case algorithmically since DB's are almost never able sort entirely in memory. But there are plenty of other examples where in memory sorting is necessary or provides advantages for later computation steps (eg. ability to cut of elements larger than a certain threshold).
If you're interested in alternative sort algorithms, you might enjoy the self-improving sort [1]. A simplified tl;dr: given inputs drawn from a particular distribution + a training phase, the result is a sort that is optimal for that particular distribution. The complexity is in terms of the entropy of the distribution, and can beat the typical worst case O(n log n) for comparison sorts.
I got widely different results there:
C - 105.868545%
C++ - 80.0518%
Java - 61.664313124608775%
Probably the optimizations there; can't easily be done in Java one; interesting nonetheless, thanks for post.
Quicksort is worst case O(n^2) time, unless you incorporate something like Quickselect for your pivot (which no one ever does, because it makes it relatively complicated. Have you ever seen an O(n log n) guaranteed quicksort implemented? I haven't - best I've seen is median-of-3 or median-of-5 pivots - or randomized). Furthermore, I've never seen an O(1) space version of quicksort and I'm not sure one can exist -- see, e.g. http://stackoverflow.com/questions/11455242/is-it-possible-t...
The meaningful comparison would actually be to Heapsort, which is in-place, O(1) space, and NOT stable - though much, much, simpler.
ADDED:
Anyone who uses quicksort should read this gem from Doug McIlroy, which elicits an O(n^2) behaviour from most quicksort implementations: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf -
[+] [-] _delirium|12 years ago|reply
This particular implementation (looking at the C version) uses fixed-size 'long ints' for its temporary data storage, which means it only works on arrays up to LONG_MAX elements. If you had larger arrays, your need for temporary data would grow, e.g. you could upgrade all those long ints to long long ints and accommodate arrays up to LLONG_MAX. Of course, logarithmic growth is very slow.
[+] [-] Buge|12 years ago|reply
The data types in the RAM model are integer and floating point (for storing real numbers). Although we typically do not concern ourselves with precision in this book, in some applications precision is crucial. We also assume a limit on the size of each word of data. For example, when working with inputs of size n, we typically assume that integers are represented by c lg n bits for some constant c >= 1. We require c >= 1 so that each word can hold the value of n, enabling us to index the individual input elements, and we restrict c to be a constant so that the word size does not grow arbitrarily. (If the word size could grow arbitrarily, we could store huge amounts of data in one word and operate on it all in constant time—clearly an unrealistic scenario.)
I like to think of the memory requirement as the number of integers required, not that the integers have to get larger. If we considered the size of the inters then traditional O(log n) memory algorithms (like the average case of quicksort) would be determined to take O((log n)^2) memory.
[+] [-] tbingmann|12 years ago|reply
[+] [-] detrino|12 years ago|reply
[+] [-] sorbits|12 years ago|reply
Furthermore, an array of size N is normally storing N variables (pointers) not N bits, so calculating storage required in bits relative to the input size (without a unit) is disingenuous.
[+] [-] jmpeax|12 years ago|reply
[+] [-] beagle3|12 years ago|reply
[+] [-] tbingmann|12 years ago|reply
To make it I needed a C++ version with iterators, which I though would be faster. But it is still about 20% slower than stable_sort for the default random input test. It probably also stays the same for other inputs.
[+] [-] beagle3|12 years ago|reply
[+] [-] masklinn|12 years ago|reply
"a few tweaks" is a bit of an understatement, at a high-level it's a hybrid of insertion and merge sort (it's an insertion sort below 64 elements, and it uses insertion sorts to create sorted sub-sections of 32~64 elements before applying the main merge sort)
[+] [-] taylorbuley|12 years ago|reply
We've got Crossfilter (https://github.com/square/crossfilter/wiki/API-Reference); however, as more data moves client-side with storage APIs like IndexedDB, I see a need for "as efficient as possible"
[+] [-] phillmv|12 years ago|reply
This lead me to look up browser sort implementations; http://stackoverflow.com/questions/234683/javascript-array-s... - it seems Moz uses mergesort and Webkit may or may not do something silly for non contiguous arrays.
So, there could be a use for it. For most applications you're about fine as it is.
2. I'm interested in hearing about applications where you're loading millions of array elements in people's browsers.
I was going to be cranky and make rude comments but I can envision people wanting to play with their data without loading it in specialized toolsets/learn R/build a DSL in $lang_of_choice.
[+] [-] hltbra|12 years ago|reply
[+] [-] dignati|12 years ago|reply
[+] [-] danbruc|12 years ago|reply
UPDATE: I could not find the article I had initially in mind but I found this one [1] showing that even prominent implementations of simple algorithms like binary search or quicksort contain bugs more often than one expects and they may even remain unnoticed for decades.
So take this as a warning - if you implement this algorithm you will almost surely fail no matter how smart you are or how many people look at your implementation.
[1] http://googleresearch.blogspot.de/2006/06/extra-extra-read-a...
[+] [-] TheLoneWolfling|12 years ago|reply
[+] [-] stokedmartin|12 years ago|reply
- take the two sorted arrays A & B (assume both are of size n) and make partitions of size log n in one of them, let's say A
- Now considering there are n/logn processors, assign each partition to a processor. On each partition take the last element (l) and do a binary search to find a cut point in the other array B such that all elements in B are <= l. Cut points of two such partitions in A correspond to a partition in B which can then be sequentially merged by the processor.
Span is O(log n); Work is O(n); so parallelism is O(n/logn) Detailed information here [1]
[0] https://github.com/BonzaiThePenguin/WikiSort/blob/master/Cha...
[1] http://electures.informatik.uni-freiburg.de/portal/download/...
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] sesqu|12 years ago|reply
[+] [-] jokoon|12 years ago|reply
I mean what sorts of big data sets are you sorting that much often ?
[+] [-] rguldener|12 years ago|reply
[+] [-] malkia|12 years ago|reply
[+] [-] jsonified|12 years ago|reply
[+] [-] chrismonsanto|12 years ago|reply
[1]: http://www.cs.princeton.edu/~chazelle/pubs/selfimprove.pdf
[+] [-] bmvakili|12 years ago|reply
[+] [-] pjscott|12 years ago|reply
The C code compares running time with a very standard mergesort.
The C++ code compares with std::stable_sort.
The Java code compares with a standard mergesort -- very similar to the code in the C version -- but has hard-to-predict JIT warmup effects.
[+] [-] michaelmior|12 years ago|reply
[+] [-] apples2apples|12 years ago|reply
[+] [-] astrada|12 years ago|reply
[+] [-] beagle3|12 years ago|reply
The meaningful comparison would actually be to Heapsort, which is in-place, O(1) space, and NOT stable - though much, much, simpler.
ADDED:
Anyone who uses quicksort should read this gem from Doug McIlroy, which elicits an O(n^2) behaviour from most quicksort implementations: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf -
[+] [-] anonymoushn|12 years ago|reply
[+] [-] zenciadam|12 years ago|reply