It is not the sorting per-se which was improved here, but sorting (particularly short sequences) on modern CPUs with really the complexity being on the difficulty of predicting what will work quickly on these modern CPUs.
Doing an empirical algorithm search to find which algorithms fit well on modern CPUs/memory systems is pretty common, see e.g. FFTW, ATLAS, https://halide-lang.org/
What's well studies is theoretical algorithmic sorting.
For practical sorting, for a particular CPU architecture, there is still plenty of low hanging fruit:
> Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures
Part of it is because hardware properties are always changing. The instructions available, the relative speed of CPU to memory and the various caches, how big and numerous and fast the various caches are, etc etc.
I am curious why things can't just get better on a base that doesn't change, until the base changes because the improvements with the new base are just that much better...
Or is that why hardware properties change so much?
zzbn00|2 years ago
Doing an empirical algorithm search to find which algorithms fit well on modern CPUs/memory systems is pretty common, see e.g. FFTW, ATLAS, https://halide-lang.org/
dist-epoch|2 years ago
For practical sorting, for a particular CPU architecture, there is still plenty of low hanging fruit:
> Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures
https://opensource.googleblog.com/2022/06/Vectorized%20and%2...
jackmott42|2 years ago
GravityLabs|2 years ago
Or is that why hardware properties change so much?