(no title)
vprcic
|
17 days ago
Each time a discussion about sorting starts, I'm reminded of a "lively debate" I had with my boss/friend about the most optimal sorting approach. He claimed it's O(n) pointing to counting sort as an example. This didn't sit well with me. A sorting algorithm, I insisted, should be defined something like "a function taking an unsorted array of elements and returning a sorted one". But it seems there is no agreed upon definition and you can have something like counting sort where you get an array in O(n) but still need O(m) to get the index of a specific element. I argued that then the best sorting algorithm is the do-nothing algorithm. It returns the initial array in O(1), but needd O(m log m) to get you the index of an element (it uses merge sort to do it).
Epa095|17 days ago
1: https://en.wikipedia.org/wiki/Comparison_sort
jalacira|17 days ago
[deleted]
NooneAtAll3|17 days ago
if you don't care about W or it is essentially constant - then it can be dropped
but it is an input parameter that will change execution time
marcosdumay|17 days ago
> if you don't care about W or it is essentially constant - then it can be dropped
Also, every algorithm that ends in a real computer is bound to a constant time. That's still not a practical thing to do.
emil-lp|17 days ago
unknown|17 days ago
[deleted]
hinkley|17 days ago
geocar|17 days ago
Gosh. Let me try to convince you.
I use permutation arrays all the time: lists of indexes that can be used across multiple vectors.
This is much faster than the pattern of scanning rows, constructing tuples of (thingToSort . thingIWantInThatOrder) and making a custom sort function, and destructuring those tuples...
And really, not having to write custom sort functions is really really nice.
> Especially in telemetry, where mean is easy and median is not.
Funny. Yes median is obvious with a permutation array, and maybe mean is less so.
When your data is really big and not very variable, mean of x is roughly the same as the mean of any sufficient sample of x, and that sample can be meaningfully represented as a permutation array!
You can get such an array with reservoir sampling and some maths, and (depending on what you know of your data and variance) sometimes even simpler tricks.
That's kindof actually how the "faster than dijkstra" trick referred-to in the article works: Data sets with small variance has this same property that the min of x is roughly the same as the min of a sufficient sample of x (where the size of sufficient has to do with the variance). And so on.
Another big use-case in my code is trees: Apter trees have a flat memory layout which is convenient for permutation arrays which can simultaneously represent index, rotation, tombstones, and all sorts of other things you might need to do with a tree.
Give it a dig. There's good stuff in there.
pvillano|17 days ago
Also, if you're taking an average of floating point numbers, you might want to sort it first and add from smallest to largest, to better preserve precision
charleslmunger|17 days ago
vlovich123|17 days ago
But sorting by arbitrary strings like names can’t avoid comparison sort.
myhf|17 days ago
"sorting" means assigning things into bins (which are usually ordered).
emil-lp|17 days ago
tpoacher|17 days ago
[0] https://news.ycombinator.com/item?id=2657277
pelcg|17 days ago
onecommentman|17 days ago
https://en.wikipedia.org/wiki/Spaghetti_sort
https://every-algorithm.github.io/2023/11/25/spaghetti_sort....
The equivalent for positive link weight shortest paths is the “string algorithm” — build the network out of string and pull taut between the two knots/nodes, the resulting set of taut links is the shortest path.