top | item 45291538

Faster Argmin on Floats

22 points| return_to_monke | 5 months ago |algorithmiker.github.io

11 comments

This trick is very useful on Nvidia GPUs for calculating mins and maxes in some cases, e.g. atomic mins (better u32 support than f32) or warp-wide mins with `redux.sync` (only supports u32, not f32).

teo_zero|5 months ago

I had expected something about algorithms, not Rust-specific implementations.

why_only_15|5 months ago

doing a u32 compare instead of an f32 compare is not rust-specific or indeed CPU-specific.

TheDudeMan|5 months ago

How fast if you write a for loop and keep track of the index and value of the smallest (possibly treating them as ints)?

nine_k|5 months ago

I hazard to guess that it would be the same, because the compiler would produce a loop out of .iter(), would expose the loop index via .enumerate(), and would keep track of that index in .min_by(). I suppose the lambda would be inlined, maybe even along with comparisons.

I wonder could that be made faster by using AVX instructions; they allow to find the minimum value among several u32 values, but not immediately its index.

meisel|5 months ago

Another speed up method here would be using simd, although it would be interesting to see in the assembly if it was auto-vectorized already.

This reminds me of a trick to sort floats faster, even if they have negatives, nans, and inf: map each float to a sortable int version of itself where one can compare them as ints (the precise mapping depending on how you want to order stuff like Nan). The one time conversion is fast and will pay off for the lg(n) comparisons. Then after sorting, map them back.