top | item 14667136

(no title)

_hrfd | 8 years ago

Basically all comparison-based sort algorithms we use today stem from two basic algorithms: mergesort (stable sort, from 1945) and quicksort (unstable sort, from 1959).

Mergesort was improved by Tim Peters in 2002 and that became timsort. He invented a way to take advantage of pre-sorted intervals in arrays to speed up sorting. It's basically an additional layer over mergesort with a few other low-level tricks to minimize the amount memcpying.

Quicksort was improved by David Musser in 1997 when he developed introsort. He set a strict worst-case bound of O(n log n) on the algorithm, as well as improved the pivot selection strategy. And people are inventing new ways of pivot selection all the time. E.g. Andrei Alexandrescu has published a new method in 2017[1].

In 2016 Edelkamp and Weiß found a way to eliminate branch mispredictions during the partitioning phase in quicksort/introsort. This is a vast improvement. The same year Orson Peters adopted this technique and developed pattern-defeating quicksort. He also figured out multiple ways to take advantage of partially sorted arrays.

Sorting is a mostly "solved" problem in theory, but as new hardware emerges different aspects of implementations become more or less important (cache, memory, branch prediction) and then we figure out new tricks to take advantage of modern hardware. And finally, multicore became a thing fairly recently so there's a push to explore sorting in yet another direction...

[1] http://erdani.com/research/sea2017.pdf

discuss

xenadu02|8 years ago

It's always good to remember that while Big-O is useful, it isn't the be-all end-all. The canonical example on modern hardware is a linked list. In theory it has many great properties. In reality chasing pointers can be death due to cache misses.

Often a linear search of a "dumb" array can be the fastest way to accomplish something because it is very amenable to pre-fetching (it is obvious to the pre-fetcher what address will be needed next). Even a large array may fit entirely in L2 or L3. For small data structures arrays are almost always a win; in some cases even hashing is slower than a brute-force search of an array!

A good middle ground can be a binary tree with a bit less than an L1's worth of entries in an array stored at each node. The binary tree lets you skip around the array quickly while the CPU can zip through the elements at each node.

It is more important than ever to test your assumptions. Once you've done the Big-O analysis to eliminate exponential algorithms and other basic optimizations you need to analyze the actual on-chip performance, including cache behavior and branch prediction.

flukus|8 years ago

> It's always good to remember that while Big-O is useful, it isn't the be-all end-all. The canonical example on modern hardware is a linked list. In theory it has many great properties. In reality chasing pointers can be death due to cache misses.

My favorite example is adding and ordered list of items into a a simple tree, all you've really done is created a linked list. Big-O doesn't know what your data looks like but you generally should.

lorenzhs|8 years ago

Don't count out sample-sort just yet, it lends itself to parallelisation very well and is blazingly fast. See https://arxiv.org/abs/1705.02257 (to be presented at ESA in September) for an in-place parallel implementation that overcomes the most important downsides of previous sample-sort implementations (linear additional space).

beagle3|8 years ago

There's actually 3:

Quick sort (unstable, n^2 worst case, in place, heapsort (unstable, n log n worst case, in place) and merge sort (stable, n log n worst case, not in place)

There are variants of each that trade one thing for another (in placeness for stability, constants for worst case), but these are the three efficient comparison sort archetypes.

Of these, quicksort and heap sort can do top-k which is often useful; and heapsort alone can do streaming top-k.

xoroshiro|8 years ago

Interesting history!

>Sorting is a mostly "solved" problem in theory, but as new hardware emerges different aspects of implementations become more or less important (cache, memory, branch prediction)

This makes me wonder what other hardware tricks might be used for other popular algorithms such as ones used in graphs. I'm sure shortest path is also one of those algorithms that have been "solved" in theory but have a huge amount of research, but personally, what would be more interesting to hear about is something that isn't quite as easy. Something like linear programming with integer constraints or even something like vehicle routing or scheduling. To anyone studying those areas, is there anything you find particularly interesting?

lorenzhs|8 years ago

There's a vast amount of results for shortest-path search on graphs that look like road networks. This was of course motivated by the application to route planning. If you're looking for a starting point, the literature list at http://i11www.iti.kit.edu/teaching/sommer2017/routenplanung/... is quite comprehensive and grouped by technique. It's a lecture at the group that developed the routing algorithm used by Bing Maps. I work one storey below them, at the group where the algorithm used by Google Maps was developed :)

agumonkey|8 years ago

Thanks for the alexandrescu paper

yoran|8 years ago

Thanks for this answer!