Good question. I added a test for sorted runs in addition to random data, and added pdqsort for comparison.
avx512_qsort benefits from sorted runs while vqsort does not, but vqsort is still 20% faster in this case as well.
The full results are in the README, but a short version:
--------------------------------------------------
Benchmark Time CPU
--------------------------------------------------
TbbSort/random 3.17 s 16.7 s
HwySort/random 2.27 s 2.27 s
StdPartitionHwySort/random 2.06 s 4.01 s
PdqSort/random 5.67 s 5.67 s
IntelX86SIMDSort/random 3.73 s 3.73 s
TbbSort/sorted_runs 1.58 s 8.02 s
HwySort/sorted_runs 2.38 s 2.38 s
StdPartitionHwySort/sorted_runs 1.11 s 3.21 s
PdqSort/sorted_runs 5.30 s 5.30 s
IntelX86SIMDSort/sorted_runs 2.90 s 2.90 s
Thanks for sharing. I've posted bench_sort results in another thread.
vqsort is about 1.8 times as fast for uniform random, a bit more of an advantage than your results here.
Note that there appears to be some bug because our benchmark's verification fails.
sagarm|3 years ago
avx512_qsort benefits from sorted runs while vqsort does not, but vqsort is still 20% faster in this case as well.
The full results are in the README, but a short version:
janwas|3 years ago
Note that there appears to be some bug because our benchmark's verification fails.