How hard can generating 1024-bit primes be?

dgacmu|1 year ago

Related, there are a few cryptocurrencies that used things related to finding large primes as part of their proof of work functions. It turns out that ~8 years ago, a really fast primality test implementation could make you a lot of money. (For some period of time I was the author and maintainer of mining software for riecoin. Why, I have no idea, except that I like prime numbers.)

This article omits the number one optimization for fast primality testing: Montgomery multiplication

https://en.m.wikipedia.org/wiki/Montgomery_modular_multiplic...

It forms the basis of fast practical modular exponentiation implementations.

Niall Emmart, then in academia and now I believe at Nvidia, released a really, spectacularly fast GPU bigint library, CGBN: https://github.com/NVlabs/CGBN

It's still the fastest batch modexp implentation that I'm aware of. It's breathtaking, if I can gush in geek for a moment.

(Someday I should write up the story of how that helped me dominate production of a small cryptocurrency for about 5 years. Thanks, Niall - owe you a beer if we ever cross paths!)

Also worth noting that python includes an entirely fine modexp in the three-argument form of pow(x, y, m) --> compute x^y % m

With that, you can very easily implement a fermat or miller-rabin primality test if you want to roll your own, which is quite fun. If you don't, the gmp library provides a very nice mpz_probab_prime() function. Gmp is obviously faster, but it's hard to beat the fun of a two liner fermat test for playing with big primes.

kwantam|1 year ago

Great stuff dga :)

Turns out, Niall was also involved in one of the winning ZPrize submissions for fast multi-scalar multiplication (closely related to batch modexp, although over an elliptic curve rather than mod a prime); I assume it inherits from his work on CGBN.

He give a very nice talk about it last year at a Stanford crypto lunch, and it turns out the slides and recording are online!

https://cbr.stanford.edu/seminarTalks/slides_20230526_niall_...

https://www.youtube.com/watch?v=KAWlySN7Hm8

mxwsn|1 year ago

I'd love to read that story. Please do write it up!

codeflo|1 year ago

Do you know why said cryptocurrency would use such a bespoke proof-of-work function? I.e., was it just someone ignorant who had only a vague idea that cryptography somehow uses prime numbers and didn’t know when or why, or was there a deeper reason?

unknown|1 year ago

[deleted]

Perepiska|1 year ago

pow(x,e,mod) was the reason I switched from Perl to Python :)

kadoban|1 year ago

> This is where things start to get interesting. At first, I found the concept of probabilistic primality tests strange and tried to look for deterministic algorithms that could handle huge numbers. I did find two - APR-CL and ECPP. Both of these are so mathematically complex that I could not make sense of their research papers at all, and there isn't much accessible information about them on the internet for someone like me who is bad at math.

> After taking a look at discussions online, OpenSSL's source code and recommendations by NIST, I realized that almost everyone including RSA uses probabilistic algorithms. The catch is that if implemented properly, these algorithms have an extremely low error rate which is negligible.

For a given maximum number range, it's trivial to make Miller-Rabin actually deterministic. You just choose bases that have been proven to together exclude all pseudoprimes in the given range.

(It doesn't even end up being a long list, Miller-Rabin kicks ass)

GaggiX|1 year ago

>For a given maximum number range, it's trivial to make Miller-Rabin actually deterministic. You just choose bases that have been proven to together exclude all pseudoprimes in the given range.

What are the bases for the range of 1024-bit numbers? I couldn't find an answer online.

mateo1|1 year ago

Besides, when you're just looking for a prime, you can spot things that look like a prime and test them deterministically.

jcalvinowens|1 year ago

One line of inline assembler makes the bignum school multiplication trivial: https://github.com/jcalvinowens/toy-rsa/blob/master/bfi.c#L4...

If I could go back in time and change one thing about the C language, I would add some notion of expanding multiplication. It's a shame Rust doesn't have it either. Hardware support is everywhere: hell, the Cortex M0 doesn't do division, but it has an expanding multiply!

This is from a very ugly little toy implementation of RSA I wrote a long time ago: https://github.com/jcalvinowens/toy-rsa

I found I could get away with the Fermat test, because the algorithm doesn't work if the primes aren't actually prime: the Fermat test is fast, and an encrypt/decrypt eliminates the extremely minuscule chance either prime is a fermat liar.

But I don't know if it can be proven there do not exist non-prime P/Q values which produce an RSA keypair which can successfully encrypt/decrypt messages. I'm sure this isn't kosher for a real implementation, but I've never found an answer.

AndriyKunitsyn|1 year ago

Curiously, C actually has bignums. Now. In C23, they added a _BitInt(N) type (e.g., "_BitInt(1024)" for a 128-byte type).

The compiler support for that is limited, though. To let N be >128 in Clang, -fexperimental-max-bitint-width=N flag can be provided. If N>128 and _BitInt(N) is divided by something, the compiler will just crash, but +, -, * all work as expected.

samatman|1 year ago

> If I could go back in time and change one thing about the C language, I would add some notion of expanding multiplication.

Zig makes this relatively easy: it has a @mulWithOverflow intrinsic, which returns an overflow bit with the result, and it has integers up to (u|i)65535. So depending on what you're doing, you can either detect overflow and then upcast, or upcast first and then optionally truncate.

It also has saturating multiply as a separate operator *|, or wrapping with *%, for when those are the semantics you want. Otherwise overflow is safety-checked undefined behavior, which will panic in Debug and ReleaseSafe build modes.

cbright|1 year ago

> But I don't know if it can be proven there do not exist non-prime P/Q values which produce an RSA keypair which can successfully encrypt/decrypt messages. I'm sure this isn't kosher for a real implementation, but I've never found an answer.

If p and q are coprime Carmichael numbers then RSA will still successfully encrypt and decrypt messages, though this will be less secure as p*q will have smaller prime factors and thus be easier to factor.

lifthrasiir|1 year ago

I believe that both in most C compilers and in Rust, casting to a bigger type and multiplying does produce the exact opcode.

winternewt|1 year ago

The original Pretty Good Privacy (PGP) by Philip Zimmermann in 1994 used only a sieve that divided by all known 16-bit primes (this table was produced using the Sieve of Eratosthenes), followed by the Fermat test.

AceJohnny2|1 year ago

> I would add some notion of expanding multiplication.

C type promotion is complicated enough!

Intrinsics expose this adequately, don't you think?

Ditiris|1 year ago

How long did this take you? Curious as I did an undergrad research project on multiplying large integers that basically took two semesters. I implemented Karatsuba, Toom-Cook, the complex FFT, a few NTTs, and Schonhage-Strassen. Primes are basically math magic. A Friendly Introduction to Number Theory by Silverman is an amazing math book for those interested.

FYI the link on your page reads 4025051 instead of 40250519.

LegionMammal978|1 year ago

Nice article! I've also rolled some of my own bigint code recently (for earlier versions of [0]), and I can recall how frustrating it was to translate high-level descriptions in math papers into actual operations.

I do have a small quibble, though:

> At this point, the BigInt is using base-(2^64-1) or base-18446744073709551615 and it only needs 16 "digits" to represent a number that uses 309 digits in base-10!

If you use the full range of a u64, then your number will be in base 2^64, with each word ranging from 0 to 2^64-1, in the same way that base-10 digits range from 0 to 9.

[0] https://github.com/LegionMammal978/bigfoot-sim

adgjlsfhk1|1 year ago

note that your last optimization of increasing the number by 2 if it fails rather than generating a new random number actually breaks the security slightly. Primes aren't evenly distributed, so doing this biases you towards primes that are directly after large prime gaps.

glitchcomet|1 year ago

Yeah i read about this in my research. It is a tradeoff between execution speed vs randomness of primes, i choose to go with speed assuming that 16 threads all starting from a random number and competing to find the prime would add enough randomness. If someone preferred more randomness in place of speed it's an easy change to replace the +=2 with a rng() call.

aaplok|1 year ago

Nice article, and nicely written.

> My code from attempt #3 was effectively using base-255, with each byte acting as a single "digit".

I think the author means base-256, not base-255.

SAI_Peregrinus|1 year ago

There are famously 3 hard problems in computer science.

    1) Naming things 
    2) Cach3) Concurr invalidation
    ency 
    4) Off-by-one errors

Dylan16807|1 year ago

> My binary implementation was probably spending almost all of its time waiting to read or write numbers from RAM due to L1 cache misses. I did not know at the time how to actually test this, but I thought let's try a more memory efficient version anyway and see if it improves things.

The text never comes back to this, but the answer is that a handful of 1-2KB numbers fit into L1 just fine, and even if they didn't there's a megabyte or more of L2 that takes about 3ns to access.

> It goes without saying that probably none of this is actually cryptographically secure, but that was never the point anyway.

Hmm. This is just the prime generation, so it avoids most RSA pitfalls, and urandom had better be secure. So if this code works at all then there's not much that could go wrong. RSA has a few issues with weak primes to avoid but I don't know if they're likely enough to matter here.

AceJohnny2|1 year ago

This sends me back.

My first-year college project, a few decades ago, was to implement 4096-bit RSA encryption, the idea of my project-mate and friend (and later valedectorian) who implemented the core math.

I recall how slow prime number generation was in our final implementation, taking ~20 minutes to generate on PA-RISC workstations. My friend, a math nerd, took it upon themselves to continue optimizing the code long after the project was over. I remember them reading papers on prime number detection, and bignum math implementations. For example, one huge improvement came when the code would detect if a number in a component multiplication was 0, then skip the multiplication and give a 0 result!

kevin_thibedeau|1 year ago

On slow hardware you're much better off generating elliptic curve keys. Otherwise you wait a long time or compromise on future proof security.

opticfluorine|1 year ago

> The random number returned is OR-ed with 0b1000000000000001 to set its first and last bit to 1. The last bit set to 1 makes it an odd number and the first bit set to 1 ensures that it is a sufficiently large number which covers the entire range of bits I need.

I can understand setting the low bit to 1 since an even number will never be a prime (edit: obviously except 2). But why set the high bit to 1 as well? Admittedly I don't know much about prime numbers or crypto, but it seems to me like this is just giving up a bit of entropy unnecessarily. What am I missing here?

toast0|1 year ago

Another factor is if your high bit is always set, and you encode the prime with that bit, your prime always takes the same number of byte to encode.

Variable byte encoding can lead to problems, if you need to exchange the data between different software, unless the specifications are very clear, and well tested. (See problems with RSA based DHE if the server public key has leading zeros)

lordnacho|1 year ago

Same as generating a 2 digit number. If the first digit is a zero, it is not a 2 digit number.

glitchcomet|1 year ago

As the other comments have mentioned, by setting the first bit to one it looses a bit of entropy but ensures that the prime is large enough. Another thing to add is that in RSA two primes are multiplied together. If one of them is 1024 bits the other can be ~200 bits (if i remember correctly) and still reach the required number of entropy bits for the key. So, having both primes be 1024-bit adds a bit of wiggle room too.

kadoban|1 year ago

You are giving up a bit of entropy, yeah, but you still have 1022, it's probably safer than wondering if a 1020 bit prime is fine even if they asked for a 1024 bit one. Eg we usually don't consider 00042 a 5-digit number.

Technically probably depends on exactly what you're using it for which choice is optimal, but I'd think the one in the article is the safer default.

bobbylarrybobby|1 year ago

Losing one bit of entropy to generate a prime that's definitely not only 50 bits long seems like a worthy tradeoff.

LoganDark|1 year ago

I would recommend keeping one handle open to /dev/urandom rather than opening and closing a new one every single time you generate a new random number.

SAI_Peregrinus|1 year ago

Or use the getrandom() syscall.

IHateScrabble|1 year ago

In the section on Fermat's little theorem there is a small mistake in the first statement of the theorem. You've written: "If p is prime and a is any integer not divisible by p, then the number a^(p-1) is divisible by p" this should read "a^(p-1) - 1 is divisible by p". The second statement of the theorem in terms of modular arithmetic is correct though.

tails4e|1 year ago

Really nice writeup. For the test, could Fermat's method be used to find candidates and then trial division to weed out the false positives? Given they are rare this should not cost much extra time as the trial division (with some precomputed divisiors) is rarely used.

pclmulqdq|1 year ago

One quick thing to add to this: /dev/urandom does not generate "true" random numbers. TRNGs generate 1 bit out per bit of entropy they collect from the environment, while /dev/urandom will not stop generating random bits when it runs out of entropy. That makes it a CSPRNG that is seeded by a TRNG.

For all practical purposes, a CSPRNG seeded by a TRNG is almost as good as a TRNG, but it isn't quite the same.

Linux used to recommend /dev/random which actually was a TRNG (although its entropy collection would sometimes overestimate how much entropy it got, particularly on servers), but it wasn't practical to use as your primary cryptographic RNG because it was very slow.

SAI_Peregrinus|1 year ago

A HWRNG is not necessarily a TRNG. As you say, a TRNG has one bit of entropy per bit of output. There's no way to prove this property is even possible in this physical universe, since it requires perfect unpredictability. Urandom is a CSPRNG seeded by a HWRNG.

Jerrrry|1 year ago

[deleted]

shanurrahman|1 year ago

That was an excellent article. Particularly enjoyed the iterative improvements throughout. It clearly shows the significant performance boost that can be achieved through good engineering and optimization.

m1keil|1 year ago

Unrelated to crypto: Trying to select the word "entropy" (with the cool colour & wave effect) grinds Chrome to halt on Android.

Jerrrry|1 year ago

Range/Selection at user touch/click, seven times, 60 times a second.

gigatexal|1 year ago

What a really fascinating article. I especially liked the humility wrt to being lacking in advanced math skills.

thehappypm|1 year ago

I’m surprised dev/urandom doesn’t count as a dependency!

pelagicAustral|1 year ago

I love that the code panics when trying to divide by zero. Reminds me I should catch stuff even if it's for my own sake, I generally opt to completely overreact.

cschmittiey|1 year ago

What are you using for your site? Loved the entropy styling!

glitchcomet|1 year ago

It is custom html/css built with my own simple static site generator. The entropy styling is css animations delayed for each letter such that it matches up. Glad that you liked it!

throw0101b|1 year ago

I remember first trying out PGP in the early 1990s on a 80386 or 80486 on Linux and it took forever to generate a new key.

sva_|1 year ago

If you try to generate a PGP key using GPG on a fresh Linux system, it may flatout refuse to do so claiming there is not sufficient entropy. Or at least it was like that some years ago.

timmy777|1 year ago

Beautiful article. Nice research

glitchcomet|1 year ago

Thanks a lot!

fguerraz|1 year ago

Very good write up ! Thanks

paulcole|1 year ago

Tell ‘em Wash!

86 comments