top | item 43065206

(no title)

mlochbaum | 1 year ago

I'd forgotten about patience sorting! Not that I was ever exactly familiar with it. JesseSort seems nearly identical, except that it allows insertion to either side of the lists, tries the previous index before doing a binary search, and uses pairwise instead of k-way merge at the end. The expected Θ(√n) list count is the same as well. An obvious advantage of going double-ended is that you get automatic O(n) sorting if the input's an ascending or descending run, instead of just one of those.

I think both algorithms can be made stable; if there's a barrier for patience sort it would be that the merge phase pops off the end of the sorted lists so it's going in the opposite direction as insertion? For JesseSort, an element that's equal to a previous one can only be inserted to the same list, or a more central one (created later). So you need to merge sublists giving priority to the outer ones, and also never append an element to the bottom of a sublist if it's equal to that list's current minimum. Doesn't look like JesseSort as implemented in jessesort_c.pyx does either of these: it merges non-adjacent lists and has a backwards merge by virtue of using < instead of <=, and always inserts to the innermost list possible instead of being strict about the bottom side. Which is understandable as being strict has a major downside. If you get a string of equal values in the bottom half of the range, you'd be stuck putting each one in a new sublist until you get to the midpoint and can create a new one where these values start going to the top.

discuss

order

tim-peters|1 year ago

Ya, I'm just too old for bottomless pits anymore ;-) Patience sorting is still worth study, for its elegance, simple code, and real-world practicality in the context of solving the longest increasing subsequence problem. As a full-blown sorting algorithm, it only excels at reverse-ordered and "low diversity" inputs, but that's in return for very simple, uniform, and easily understood code.

Of course any of these could be changed to use the powersort merge strategy, if desired. It's elegant and principled. But it also exploits that timsort and powersort keep the _number_ of runx simultaneously active no more than log2(N). Needing to keep track of potentially O(N) sublists simultaneously is potentially burdensome.

In any case, I'm most interested in adaptive sorting, since partial order so very often exists in the real world (as you noted that Orson Peters noted, this is often due to "low diversity" as a consequence of sorting on one field of a complex record - but it's more than just that). Thanks to your good explanations, I'm persuaded that JesseSort isn't much more promising than patience sorting in that specific respect (yes, looks like JesseSort can handle ascending order in linear time too). So thanks again!