top | item 27990839

Switching from GMP to GCC's __int128 reduced run time by 95% (2016)

71 points| optimalsolver | 4 years ago |nu42.com | reply

9 comments

order
[+] nanis|4 years ago|reply
Note that soon after I posted this, Someone noticed[1,2] that it is possible avoid a whole lot of work, but I am still glad I realized how much performance one can gain by dropping GMP in favor of `__int128`.

It seems like for numbers on this side of 1024 bits, writing purpose-built routines using `__int128` or SSE/AVX instructions rather than general functionality is likely to pay off.

[1]: https://news.ycombinator.com/item?id=10915461

[2]: https://www.nu42.com/2016/01/excellent-numbers-explicit-solu...

[+] olliej|4 years ago|reply
This is presumably due to losing many/most function calls (I can't imagine division being inlined but maybe I'm wrong), and fixed size allocation.

My guess is the rest of the overhead is things like gmp needing to allocate heap storage as it doesn't get the aid of having fixed size storage.

[+] madars|4 years ago|reply
Yep, the program referenced in the blogpost https://github.com/briandfoy/excellent_numbers/blob/master/c... is using GMP's variable-width numbers (mpz_* routines). If you know bit lengths in advance you provide pre-allocated C arrays to mpn_* routines: https://gmplib.org/manual/Low_002dlevel-Functions . This is likely going to be much faster (you avoid bookkeeping and malloc inside mpz_*), and what e.g. libsnark does for its general Fp arithmetic. (E.g. addition becomes https://github.com/scipr-lab/libff/blob/develop/libff/algebr... ) Of course, mpn_* routines still do loops which __int128 ones would presumably unroll but the bulk of the gain should come from reducing mpz_* overheads.
[+] anthk|4 years ago|reply
What about MPI?
[+] borramakot|4 years ago|reply
I don't know much about MPI, but I thought it related to message passing/distributed workloads rather than arithmetic performance?