Sorting Algorithm Cheat Sheet

swiftcoder|6 years ago

> Radix sort looks fast, with its O(n)O(n) worst-case time complexity. But, if you're using it to sort binary numbers, then there's a hidden constant factor that's usually 32 or 64 (depending on how many bits your numbers are). That's often way bigger than O(\lg(n))O(lg(n)), meaning radix sort tends to be slow in practice.

The constant factor in radix sort isn't the number of bits, it's the number of digits. Since one typically sorts on a computer using 8-bit 'digits', that's a constant factor of 4 or 8 (not 32 or 64).

ggggtez|6 years ago

It's the number of bits. You can change to bytes, but you are just hiding another constant factor of 8 by doing that.

ComputerGuru|6 years ago

Of course in the real world for problems where sorting is actually the bottleneck and not just something you want not to kill your app's performance, you end up with things like timsort that destroy all these in practice.

bjoli|6 years ago

I'd say it's the other way around: you use whatever your language provides as default (which is probably something like Timsort) until it becomes a bottleneck and then you analyse the data and write something that is specific to your use case that will blow Timsort out of the water.

Timsort is your first choice.

bjourne|6 years ago

Is there any evidence of timsort being superior in practice? I've been looking for benchmarks but on synthetic ones with randomized data, both mergesort and quicksort handily beats timsort. The belief about timsort's superiority seem to me to be more about Tim Peters being a very well-respected developer than performance data.

civility|6 years ago

Also no mention of introsort (C++ std::sort), where it uses insertion sort for small arrays, detects if it's fallen too far towards O(n^2) of quicksort, and punts to heapsort when it needs to.

saagarjha|6 years ago

I think it's important to mention that counting sort and radix sort are not "true" comparison-based sorts: they have additional requirements on the input data.

why-el|6 years ago

Indeed. Usually if you get question such as "you have an array of 1 through n", the immediate thought should be "why is it just up to n and not arbitrary numbers"? From there, you know you can get a linear sort by leveraging the mutual relationships between these numbers.

northisup|6 years ago

Who is still asking candidates to talk about sorting algorithms?

It is just trivia and has little to no bearing on if the candidate can actually do the job (unless the job relates to the runtime of sorting algorithms, of course).

hinkley|6 years ago

There are two situations where I wouldn't harass my coworkers for asking sort related questions.

One, I've asked. Not implementing sort, but implementing complex compare() functions for sortable values. There are many things I can accept as flaws in a coworker but inability to deal with 3 conditionals at once isn't one of them. And I've had people (interviewed or worse, worked with) who can't sort by last name, first name, age.

The other is the "I gave you code now tell me what's wrong with it" answer but that's not writing a sort algorithm, that's detecting that someone needs a better one.

Personally, I'd be all too happy if we stop interviewing people expecting answers that would never pass a code review. I'd say it's hypocritical but it's not even that. It's just wrong headed, and gives bad information to the candidate.

why-el|6 years ago

I see your point, and usually these work best if the interviewer can prove that such a thing is needed. For instance, many jobs might require sorting of data that cannot fit on disk (think for instance of loading a massive file that you want to parse, and sorting might help you in processing it), in which case knowing merge sort (the disk variety) can help. So yes, the onus is on the interviewer I think.

jaunkst|6 years ago

I'm less worried about understanding of sorting algorithms than I am about understanding of patterns. With any team at scale my hardest challenge is about patterns. More importantly why a pattern is used. Efficiency of a sorting algorithm can be refactored. Is the implementation of the sorting composable? I'll hire without question someone who understands patterns and composablilty over someone who can site an ideal solution. Seriously at what cost does readability over complexity really mean anything to consumer value. I understand that in very limited scope computational efficiency is imperative but let's be pragmatic and say good programming and understanding of best practices trump perfect solutions.

adrianmonk|6 years ago

I think the intention is that the job will relate to the running time of other algorithms and that sorting things is a toy problem you can use to test whether they know anything about that subject in general.

There may be better ways to assess that, though. For example, you could ask them to give one or two examples each of algorithms that are O(1), O(log N), O(N), O(N log N), and O(N^2), and O(something worse than N^2).

saagarjha|6 years ago

Sorting is such a foundational problem in computer science that pretty much every job will end up relying on the runtime of sorting algorithms. Not every job requires being able to write a bulletproof dual-partition QuickSort, of course, but knowing complexity bounds is important, even if you’re just calling your standard library’s implementation: it places fundamental bounds on what things you can improve.

ggggtez|6 years ago

Disagree. If you think that, what do you ask? "How would you Google or Stack overflow for the answer?" That's hardly going to provide any useful signal. At the end of the day you need to ask something that shows evidence there candidate knows something about efficiency tradeoffs.

CzechTech|6 years ago

Some people like that sort of thing. Mostly the "academic" types. The memorizers. There are people who got through their higher education by using memory instead of wit. I know some of those people, I went to school with some of them, they were always studying and ramming as much "data" into their brains as possible... while I was getting through with minimum of effort by just coding a lot and trying to think about things in my own way.

The memorizers in this industry often get jobs higher up the tree and therefore they are highly represented in the hiring practices.

Anyways, I completely agree with you. In my opinion it is essential to KNOW that you DON'T KNOW (i.e. you don't remember how exactly an RB tree works, you just know there is such a thing). Everything else is just a quick Google session away.

gameguy43|6 years ago

Original author here, happy to answer questions about data structures, algorithms, and coding interviews!

strainer|6 years ago

It might be notable that on very small lists (up to around 50, maybe 100 depending on cpu/cache) insertion sort is fastest of all, even on difficult input distributions. Its so simple that cpu can skip through it rapidly. I noticed this myself and have also read other developers mention it in sort design discussions. A popular general purpose sort called 'Timsort' defaults to it for small sequences. Its possible to tweak the insert algorithm to sum distance of elements moved so far and escape to a more substantial algorithm when it gets excessive. If little rearrangement is required insertion sort is about as fast as scanning through memory can be.

I think in practice the answer to the question "which should I use" (rather than "which should ideally be used") is the best developed hybrid sort library for the platform. Beyond the suitable case for insertion sort, its a really substantial job for an individual to develop, test and optimize a high performance sort library that can handle a range of distribution types.

gfdhnk|6 years ago

LSB radix sort can be slow, but MSB radix sort is the fastest sorting algorithm for an array of machine words.

pstuart|6 years ago

"No CS degree necessary." lured me in. I'm subscribed and looking forward to learning more.

gameguy43|6 years ago

Curious: did people notice that you can click on each algorithm and click the blue button to get a detailed write up of how it works? (Or is that too hard to find?)

strainer|6 years ago

The links to decent writeups with example code ( excellent to have) are a nice surprise when a row is expanded.

On my firefox browser the paragraphs and elements in the table and writeups are somewhat excessively spaced vertically.

GorgeRonde|6 years ago

There is also sorting networks. If you have to sort short list of items, it's probably the best solution around.

https://en.wikipedia.org/wiki/Sorting_network

unknown|6 years ago

[deleted]

hopscotch|6 years ago

The best real sorting algorithms are often hybrid.

Would be nice to see mention of parallel sorting algorithms. These have basically linear speedup and you can do them on GPUs.

Obvious disclaimer that you should never care about your sorting algorithm choice until you need to. Performance is more often IO or memory related than compute. Tech interviews are daft, etc.

marvinjames|6 years ago

Best case for heapsort is actually O(n). Build heap always takes O(n). Then e.g. when all keys are the same, the max-heapify call will take O(1) instead of O(logn).

gameguy43|6 years ago

Hmmmm good point!

CobrastanJorji|6 years ago

feature request: make O(k) bright red, like O(n^2). As it stands, it looks nice and pleasant like O(1) but it's honkin' terrifying.

gameguy43|6 years ago

Oh yeah, looks like we messed that up. That space complexity also doesn't even agree with the final space complexity we describe in the write up.

I'm gonna actually change the space complexity in the table for both counting sort and radix sort to O(n)--that at least matches the amount of hand-waviness we used for the time costs for those two in the table.

49 comments