SIMD Binary Heap Operations

Sesse__|6 months ago

The fastest way of doing a heap I've found is generally: Don't. For many of the relevant operations (graph search, merging streams, etc.), you can do just as well with a winner-tree; it can usually be updated branch-free with min/max operations, whereas with a heap you'll usually have completely unpredictable branches for every single operation.

A winner-tree, or its counterpart the loser-tree (for min instead of max), is a very simple binary tree: Bottom layer is all your values (2^N of them). The layer above that is the highest of pairs of values. The layer above that is the highest of pairs of pairs. And so on, until you get to the top of the tree, which contains the largest value. Updating a value is trivial; you overwrite the relevant one at the bottom, and then run exactly log2(n) max operations upwards until the you hit the root. Inserting and deleting may, of course, be more complicated.

DennisL123|6 months ago

A winner tree uses extra space, doesn't it? That might exclude it from certain applications to be an alternative. Four-ary heaps are roughly (ymmv) twice as fast as binary heaps by exploiting cache locality in a better way for small key/value types. And it seems to be a sweet spot since eight-ary heaps don’t deliver additional improvements.

RossBencina|6 months ago

For running min/max there is an efficient O(1) algorithm by none other than Daniel Lemire: https://lemire.me/en/publication/arxiv0610046/

dooglius|6 months ago

is_heap doesn't seem like a particularly useful operation though, generally a heap is intentionally constructed as such via push_heap/pop_heap

mananaysiempre|6 months ago

Indeed that part seems more like an artist’s study than an attempt at actual usefulness, but that’s okay when figuring out if anything at all in the neighbourhood of what you’re trying to do with SIMD is even possible. As far as constructing heaps, don’t forget about (linear-time) heapify, which can be significantly faster if you have a bunch of elements and want to construct a heap with all of them in it. (This doesn’t get you a linear-time heap sort because you’ll still pay the full linearithmic price for the subsequent pop_heaps.)

simonask|6 months ago

I think it's fairly useful. It means that you can convert a contiguous array to a heap with a fast O(n) read-only check instead of O(n log n) writes, so if you know that's the common case, you can detect it up front and only revert to normal binary heap insertion if it returns false.

camel-cdr|6 months ago

make_heap should be vectorizable, that would be more useful. I can also see a path to vectorize bulk insert, but that seems harder.

superjan|6 months ago

i have wasted several weeks worth of evenings on vectorizing heaps (4ary heaps: with SIMD, you’re not limited to binary heaps). It did not provide any speedup. I’d expect that halving heap depth would help but no. Still don’t know why.

CalChris|6 months ago

It's 2025 and the simd operations in this aren't that obscure. So I wonder how close std::simd gets to the instrinsics' performance with clang and gcc.

pjmlp|6 months ago

I would suggest adding concepts to the template definition, or at least a mix of enable_if and static_assert, if to be used on versions prior to C++20.

_ache_|6 months ago

Can't see the Website.

I'm the only one with a HTTPS problem here? Bad domain, `art.mahajana.net` instead of 0x80.pl.

19 comments