top | item 34816083

(no title)

sigtstp | 3 years ago

Interesting! The benchmark appears to be using only random data though. Any measurements for partially sorted or reverse sorted data?

discuss

sagarm|3 years ago

Good question. I added a test for sorted runs in addition to random data, and added pdqsort for comparison.

avx512_qsort benefits from sorted runs while vqsort does not, but vqsort is still 20% faster in this case as well.

The full results are in the README, but a short version:

  --------------------------------------------------                                                                   
  Benchmark                            Time      CPU                                                                   
  --------------------------------------------------                                                                   
  TbbSort/random                     3.17 s  16.7  s                                                                   
  HwySort/random                     2.27 s   2.27 s                                                                   
  StdPartitionHwySort/random         2.06 s   4.01 s                                              
  PdqSort/random                     5.67 s   5.67 s                                              
  IntelX86SIMDSort/random            3.73 s   3.73 s                                              
  TbbSort/sorted_runs                1.58 s   8.02 s                                              
  HwySort/sorted_runs                2.38 s   2.38 s                                              
  StdPartitionHwySort/sorted_runs    1.11 s   3.21 s                                              
  PdqSort/sorted_runs                5.30 s   5.30 s                                                                   
  IntelX86SIMDSort/sorted_runs       2.90 s   2.90 s

janwas|3 years ago

Thanks for sharing. I've posted bench_sort results in another thread. vqsort is about 1.8 times as fast for uniform random, a bit more of an advantage than your results here.

Note that there appears to be some bug because our benchmark's verification fails.