top | item 45828561

(no title)

raphlinus | 3 months ago

Right. This is the binary tree version of the algorithm, and is nice and concise, very readable. What would take it to the next level for me is the version in the stack monoid paper, which chunks things up into workgroups. I haven't done benchmarks against the Pareas version (unfortunately it's not that easy), but I would expect the workgroup optimized version to be quite a bit faster.

discuss

order

Munksgaard|3 months ago

To be clear, you can express workgroup parallelism in Futhark, or rather, if the compiler sees that you've programmed your problem in such a way that it can take advantage of workgroup parallelism, it will.

But you're right, it would be interesting to see how the different approaches stack up to each other. The Pareas project linked above also includes an implementation using radix sort.

convolvatron|3 months ago

I've been playing with one using scans. too bad that's not really on the map for architectural reasons, it opens up a lot of uses.

kragen|3 months ago

Yeah, monoid prefix sum is a surprisingly powerful tool for parallel and incremental algorithm design!