You're right -- I got nerd-sniped by this, and ended up rewriting the benchmarks in Rust. The compiler successfully inlines the square root operation, simplifies the exponentiation into multiplication and then does common-subexpression elimination over it, but no induction variable analysis, and therefore also no loop-invariant code motion. Here's it is in Godbolt:https://godbolt.org/z/Pa6cqd1fb
But the funny thing is, I tried it on my machine, and guess what, the "unoptimised" version is consistently between 4 and 5 times faster than the hand-optimised variants:
https://gist.github.com/amnn/4be5c05975c250d0ea88b62c03f1719...
So that's fun.
ColinWright|2 years ago
I'm getting this comment a lot, but so far the reason has been that people are using smallish numbers, and implementing the square root in floats.
In the context where this exploration is taking place, you can't do that. We're working with exact huge integers, and while a square root on floats can get you in the ballpark, you then need to refine the answer to get the exact value, and that takes order of magnitude the same time.
asQuirreL|2 years ago