top | item 37236099

My favorite prime number generator

195 points| mfrw | 2 years ago |eli.thegreenplace.net

83 comments

order
[+] tromp|2 years ago|reply
My favorite prime number generator is

    ┬─┬───────────────────────────────────┬──── ────┬
    │ │ ┬─┬ ────┬─┬────────────────────── ┼───┬ ┬───┼
    │ │ └─┤ ────┼─┼───────────────────┬── ┼───┼ │ ┬ │
    │ │   │ ┬───┼─┼───────────────────┼── ┼─┬─┼ │ ┼ │
    │ │   │ │ ─ ┼─┼─┬─────┬──── ──────┼─┬ │ ├─┘ └─┤ │
    │ │   │ │ ┬ └─┤ │ ┬─┬ ┼─┬─┬ ──┬───┼─┼ ├─┘     ├─┘
    │ │   │ └─┤   └─┤ └─┤ │ ├─┘ ──┼─┬─┼─┼ │       │
    │ │   │   │     │   │ ├─┘   ┬─┼─┼─┼─┼ │       │
    │ │   │   │     │   ├─┘     └─┤ │ ├─┘ │       │
    │ │   │   │     └───┤         │ ├─┘   │       │
    │ │   │   │         │         ├─┘     │       │
    │ │   │   │         ├─────────┘       │       │
    │ │   │   ├─────────┘                 │       │
    │ │   └───┤                           │       │
    │ │       ├───────────────────────────┘       │
    │ ├───────┘                                   │
    └─┤                                           │
      └───────────────────────────────────────────┘
which graphically depicts the 167-bit lambda term

    λ 1 (1 ((λ 1 1) (λ λ λ 1 (λ λ 1) ((λ 4 4 1 ((λ 1 1) (λ 2 (1 1)))) (λ λ λ λ 1 3 (2 (6 4))))) (λ λ λ 4 (1 3)))) (λ λ 1 (λ λ 2) 2)
which generates the characteristic sequence of primes [1] [2].

[1] https://tromp.github.io/cl/cl.html

[2] https://github.com/tromp/AIT/blob/master/characteristic_sequ...

[+] cobaltoxide|2 years ago|reply
Could you explain it? Those two links are Greek to me.
[+] pixelpoet|2 years ago|reply
Super cool to see one of the authors of the most popular Go scoring methods (particularly for MCTS) posting here :) Respect and thanks!
[+] hgsgm|2 years ago|reply
What's the "characteristic sequence"?
[+] svat|2 years ago|reply
This is really elegant! To make the explanation more concrete, here's a snapshot of the algorithm's state (the dict D) after it has processed numbers up to 10 (inclusive):

    12: [3, 2]
    25: [5]
    49: [7]
So each prime up to 10 is present exactly once. When we process 12, we move 2 to the list for 14, and 3 to the list for 15.

The Sorenson paper linked in the post is also a great read, I remember reading it a couple of years ago and it is very clear. It adds many further optimizations on top of standard sieve of Eratosthenes (e.g. wheel factorization is a generalization of the idea of considering only numbers in {1, 5} mod 6, or {1, 7, 11, 13, 17, 19, 23, 29} mod 30, or …), and segmenting is what you need to do to avoid running out of memory.

The fastest such generator (using sieve-based methods) seems to be https://github.com/kimwalisch/primesieve — it implements all those optimizations and more.

However, if you just want the nth prime, it's even faster to compute π(n) by combinatorial/analytic methods, instead of a sieve: the same author has https://github.com/kimwalisch/primecount for that. (Try the ten-trillionth prime number.)

[+] tzs|2 years ago|reply
My favorite is this FRACTRAN program: (17/91, 78/85, 19/51, 23/38, 29/33, 77/29, 1/17, 11/13, 13/11, 15/2, 1/7, 55/1) with input 2.

It doesn't quite produce primes directly. It gives a sequence of integers, some of which are powers of 2. Those powers of 2 are 2^2, 2^3, 2^5, 2^7, 2^11, ..., i.e., the sequence of 2 to the power of primes.

For those who have never seen FRACTRAN it is an esoteric programming language invented by John Conway. It is Turing-complete.

Here's how you execute a FRACTRAN program, with input N.

1. Step through the list of fractions in order until you find a fraction f such that fN is an integer or you run out of fractions.

2. If you run out of fractions halt.

3. Output fN.

4. Replace N with fN.

5. Goto step 1.

Wikipedia gives some sample programs and explains in detail how the heck FRACTRAN can compute: https://en.wikipedia.org/wiki/FRACTRAN

[+] SeanLuke|2 years ago|reply
Believe it or not, there exists a closed-form equation which computes the nth prime number. It's Willans's formula:

p_n = 1 + Sum_{i=1}^{2^n} Floor((n/(Sum_{j=1}^{i} Floor((Cos((j-1)!+1)/j) Pi)^2)))^(1/n))

https://en.wikipedia.org/wiki/Formula_for_primes

https://www.youtube.com/watch?v=j5s0h42GfvM

It's not fast but it really does work.

[+] svat|2 years ago|reply
That video is actually great!

The last time this formula came up, I added some references to the Wikipedia article (current version: https://en.wikipedia.org/w/index.php?title=Formula_for_prime... about the "worthlessness" of such formulas, but it doesn't deter people from being interested in that formula… as long as they don't imagine it's useful in any way, it's fine to enjoy its perverse cleverness, I suppose.

The video does well to quote Wilf on “the disease was preferable to the cure” and to end with “Could it [C. P. Willans] be a pseudonym for someone who didn't want to tarnish their reputation by writing about useless prime-generating formulas?” :-) Dudley (who mentions a large list of other such formulas) also has a nice conclusion:

> The conclusion to be drawn from all this, I think, is that formulas for formulas' sake do not advance the mathematical enterprise. Formulas should be useful. If not, they should be astounding, elegant, enlightening, simple, or have some other redeeming value.

[+] scythe|2 years ago|reply
The memory usage can in fact be optimized further with a small change:

https://gist.github.com/scythe/5fb962722934c58c60430180beab8...

In the spirit of the blog post, I'll let you guess how it works :p

[+] svat|2 years ago|reply
Wow this is a great optimization! You're under-selling it :) It changes the memory requirement from O(π(n)) to O(π(√n)), e.g. after processing numbers up to 100, the internal state in the OP's algorithm is:

    q = 101
    D = {'102': [3, 2], '105': [7, 5], '121': [11], '169': [13], '289': [17], '361': [19], '529': [23], '841': [29], '961': [31], '1369': [37], '1681': [41], '1849': [43], '2209': [47], '2809': [53], '3481': [59], '3721': [61], '4489': [67], '5041': [71], '5329': [73], '6241': [79], '6889': [83], '7921': [89], '9409': [97]}
while with your "small change", the state is just:

    q = 101
    S = 121
    r = 11
    D = {'102': [3, 2], '105': [7, 5]}
[+] max_|2 years ago|reply
In J generating prime numbers is simply p:21, that gives you 79 which is the 21st prime number.

It is so fast, I wonder what the algorithm behind it is.

[0]: https://code.jsoftware.com/wiki/Vocabulary/pco

[+] jonahx|2 years ago|reply
> I wonder what the algorithm behind it is.

Noted in the details section of the link you provided:

"Primality testing on numbers larger than (2^31) uses the probabilistic Miller-Rabin algorithm."

[+] noman-land|2 years ago|reply
How high can you go with this?
[+] philipLutz|2 years ago|reply
I enjoyed this, but I am a little stuck on how stuck the author is on yield. If one ends up creating a large list of prime numbers with this generator, why not just return a large list of prime numbers? The author clearly notes how optimizing Python code is "odd" when you can more easily switch languages, but languages like C/C++/Rust would not have pretty code because it does not have yield. It is obvious that returning an array of prime numbers is what one would do in C/C++/Rust.

I understand that not using yield would mean it is not strictly a "generator", but the spirit of this problem is generating the sequence of primes (unless I am missing something).

[+] btilly|2 years ago|reply
Try solving some Project Euler problems. You'll see the win pretty quickly when you have to search a large range for primes.
[+] masklinn|2 years ago|reply
> If one ends up creating a large list of prime numbers with this generator

But one does not always end up creating a large list of prime numbers, do they?

> why not just return a large list of prime numbers?

Because that requires upfront knowledge of the filtering factor or absence thereof.

A generator means you don't care, you generate an infinite sequence and the consumer is free to do whatever they want.

> It is obvious that returning an array of prime numbers is what one would do in C/C++/Rust.

It's certainly not obvious for Rust.

[+] jameshart|2 years ago|reply
If the problem is outputting the primes, then storing them in a list is not a necessary part of that problem.

When writing ‘fizz buzz’, would you expect to return an array of 100 strings, some of which are numbers, some ‘fizz’es and some ‘buzz’es?

[+] Vexs|2 years ago|reply
It's a bit of a convolution, but let's say we've got some iterator of unknown size and we want to have a prime each iteration. We can do the following in python, which saves us from having to pre-compute the primes _or_ know how long `some_iterator` is.

primes = gen_primes() for item, prime in zip(some_iterator, primes):

[+] sp332|2 years ago|reply
This implementation runs until it crashes. So you need a way to output the values as you go, or they will be lost when you run out of memory. It also lets you control how many values are generated if you just want to see a few without actually using all the RAM.
[+] nemo8551|2 years ago|reply
Can I just say how surprised I am the many people have a favourite prime number generator.

I can’t say I have a favourite but I do enjoy multiplying two random prime numbers with each other and using the random result when I need a random number with added random.

[+] cobaltoxide|2 years ago|reply
That doesn't seem like a good way to generate "random" numbers, since it will only produce numbers with exactly two prime factors.
[+] andrelaszlo|2 years ago|reply
A way to add even more random is to generate a list of N random primes, then shuffling it M times before multiplying the elements. N and M are of course random positive integers.

To add maximum random, sometimes I like to return a string with a random Gertrude Stein quote, segfault, or make the PC speaker beep instead of returning a number. So random!

[+] naillo|2 years ago|reply
This person has an incredible work ethic
[+] jacknews|2 years ago|reply
Nice, I like the optimization replacing the list, and just hunting for the next 'free' composite for the prime.

Teaching my kids coding, we also arrived at most of the other optimizations, in Janet:

  (defn primes-sieve []
    (fiber/new (fn []
      (yield 2) # the only even prime, yield it explicitly.
  
      (let [sieve (table/new (math/pow 2 12))]
        (loop [n :range [3 nil 2]]  # only odd numbers
          (if-let [current-composite (sieve n)]
            (do 
              (each prime-factor current-composite
                (let [next-composite-key (+ n (* 2 prime-factor))] # add 2*prime-factor, else composite would be even
                  (if-let [next-composite (sieve next-composite-key)]
                    (array/push next-composite prime-factor)
                    (put sieve next-composite-key @[prime-factor]))))
              (put sieve n nil))
            (do
              (yield n)
              (put sieve (* n n) @[n])) # next composite is n^2. Anything less is composite a smaller prime, already in the sieve
            ))))))
[+] tzot|2 years ago|reply
The fastest variant I know of this algorithm is the one found at https://stackoverflow.com/a/3796442/6899 :

    import itertools as it
    def erat3( ):
        D = { 9: 3, 25: 5 }
        yield 2
        yield 3
        yield 5
        MASK= 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0,
        MODULOS= frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )

        for q in it.compress(
                it.islice(it.count(7), 0, None, 2),
                it.cycle(MASK)):
            p = D.pop(q, None)
            if p is None:
                D[q*q] = q
                yield q
            else:
                x = q + 2*p
                while x in D or (x%30) not in MODULOS:
                    x += 2*p
                D[x] = p
[+] svat|2 years ago|reply
This adds two things over the OP's initial version, both mentioned in the post: wheel factorization (with 30), and the "linear probing" to avoid lists. It's possible to keep improving (up to a point) by using e.g. wheel with 210 instead of 30, and so on, if one doesn't mind hard-coding more and more.
[+] simlevesque|2 years ago|reply
isn't hardcoding 2 3 and 5 a bit of a cheat ? It seems like you could hardcode it to yield the X next primes and it'd be faster.
[+] rullelito|2 years ago|reply
My favorite one:

    d = defaultdict(list)
    
    # Magic
    for i in range(2, 10000):
            for y in i in d and d.pop(i) or [i]:
                d[i+y] += (y,)
    
    # Not sure how to print this in a succinct way
    print(sorted(list(set().union(*d.values()))))
[+] paulddraper|2 years ago|reply
Beautiful

  from collections import defaultdict

  d = defaultdict(list)
  for i in range(2, 10000):
      for y in d.pop(i, [i]):
          d[i + y].append(y)
  for y in sorted(y for x in d.values() for y in x):
      print(y)
[+] anaphor|2 years ago|reply
There's a functional / lazy version I like that uses a priority queue in order to get the next prime number https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

It basically uses the priority queue to store all the composites up to a certain point and then you can assume the next one is prime.

I had some (admittedly not well written) code for it too https://gist.github.com/weskerfoot/4699275

It can also incorporates the wheel factorization optimization mentioned in this article.

[+] Exuma|2 years ago|reply
He said he has "used this code many times"... have you ever needed to find prime numbers in your daily programming?