I enjoyed reading the article, but I'm pretty thrown by the benchmarks and conclusion. All of the times are reported to a single digit of precision, but then the summary is claiming that one function shows an improvement while the other two are described as negligible. When all the numbers presented are "~5ms" or "~6ms", it doesn't leave me confident that small changes to the benchmarking might have substantially changed that conclusion.
gizmo686|5 months ago
At a 5ms baseline with millisecond precision, the smallest improvement you can measure is 20%. And you cannot distinguish a 20% speedup with a 20% slowdown that happened to get luck with clock ticks.
For what it is worth, I ran the provided test code on my machine with a 100x increase in iterations and got the following:
Which is not exactly encouraging (gcc 13.3.0, -ffast-math -march=native. I did not use the -fomit-this-entire-function flag, which my compiler does not understand).I had to drop down to O0 to see branchless be faster in any case:
Roxxik|5 months ago
https://gist.github.com/Stefan-JLU/3925c6a73836ce841860b55c8...
Someone|5 months ago
Did you check whether your branchy code actually still was branchy after the compiler processed it at higher optimization levels?
Joel_Mckay|5 months ago
Most code should focus on readability, then profile for busy areas under use, and finally refactor the busy areas though hand optimization or register hints as required.
If one creates something that looks suspect (inline Assembly macro), a peer or llvm build will come along and ruin it later for sure. Have a great day =3
hinkley|5 months ago