top | item 47045925

Fast Sorting, Branchless by Design

29 points| jedisct1 | 12 days ago |00f.net

9 comments

order

jstrieb|12 days ago

Wow, this is jam-packed with interesting information. Thanks for writing it! (Also thanks for all of your other great open source work!)

Are there plans to upstream this into the Zig std library? Seems like it could be useful for more than just the cryptography package, since the benchmarks at the end have it often being faster than std pdqsort. I just checked the issue trackers on Codeberg and GitHub, and didn't see anything mentioning djbsort or NTRU Prime, which leads me to believe there aren't (official) plans to upstream this (yet).

tialaramex|9 days ago

> often being faster than std pdqsort.

pdqsort is a generic comparison sort. Want to sort employee names, customer email addresses, JSON blobs, or Zebras? No problem, pdqsort just needs an ordering, in both Zig and C++ you write this as a single boolean "less" predicate.

DJB's speed-up relies on vectorization, which works great for integers or things you can squint at and see an integer - but obviously can't sort your employee names, customer email addresses, JSON blobs or Zebras. You could write these branchless network designs anyway but I'm pretty sure they'd be markedly slower, at least for some common inputs.

ozgrakkurt|9 days ago

The blocksort in stdlib can be faster than pqdsort too in my experience

rep_lodsb|9 days ago

    const diff = if (order == .asc) b_int - a_int else a_int - b_int;
    const sign_bit: u1 = @truncate(@as(UWInt, @bitCast(diff)) >> @intCast(bits));
    var mask_word = @as(usize, 0) -% @as(usize, sign_bit);
This code in the fallback path (when no constant-time @min/@max is available) will only work if the subtraction doesn't overflow. Or is this not a problem for some reason?

jedisct1|9 days ago

a_int and b_int are signed values.

user____name|9 days ago

Radix sort also seems to fit the bill?