I went back to the root page of this site to see what PCG is, and I don't understand it. What's the point of an RNG that is "more secure" than an insecure RNG, but less secure than a real CSPRNG? What does "Predictability: Challenging" actually mean?
It means exactly what it sounds like: the author tried to attack the generator, failed, and thus concluded that it must be a Very Hard Problem.
Note that the field of pseudorandom number generation for non-cryptographic purposes is much less rigorous than cryptography. Typically, for a new generator to be accepted as "OK" all it needs to do is pass a number of fixed statistical tests, usually one of the TestU01 batteries [1]. This is usually the only falsifiable claim you get, and if you're familiar with diffusion and how to achieve it this is easy to work out. Other falsifiable claims include equidistribution, but that is not a very useful guarantee---simply incrementing by 1 will achieve that. The notion of indistinguishability against computationally-bounded adversaries does not exist. All of this contributes to this being a field where crackpottery abounds, and is hard to distinguish good from bad.
For example, here's a generator that uses 1/4 of an AES round per output word, and that passes the Crush battery of statistical tests (which uses ~2^35 samples). Is it a good generator? I would not bet on it.
That PCG is ostensibly less predictable is not a prime selling point, more a small nice-to-have in the scheme of things.
Of course, PCG doesn't exist to secure secrets. That's for cryptographic primitives, not any old RNG. However, there are downsides for using trivially predictable RNGs. A randomized algorithm might degrade performance with carefully-crafted inputs. This may expose you to DOS attacks. Hashes are a better known issue, where several languages have moved to make hashing less predictable, and thus less game-able.
It would surely be better for an implementation to avoid exposing any of this altogether, but an RNG intended for general use should have the worst-case scenario in mind.
The reason not to go full-on cryptographic is simply that the alternatives are faster and have more features. A list of some of the cool stuff PCG has is on the front-page, but here goes:
It's extremely fast, pretty much the best non-cryptographic RNG out there distribution wise, of those tested, and a tiny dependency. The RNG is reproducible and supports multiple streams, including streams dependent on the address space. This also allows emulating Haskell's RNG splitting (where one RNG creates two new independent ones), which is neat if you don't want to share state between multiple parts of your program but still want global reproducibility, or even if you're just a functional programmer. You can skip it backwards and forwards, which is little used but a ton of help when you do want it. Each instantiated RNG is a few bytes big. Something like the Mersenne Twister is a bad thing to instantiate wildly, especially on memory-constrained systems. Not a massive issue, but it's still silly any RNG fails at this. Last but certainly not least, it's a tiny dependency.
Only cryptographers care about "unpredictability." Outside of cryptography, the interesting properties of a PRNG are speed, period length, and state size.
But a good CSPRNG has excellent statistical properties. So if you make "unpredictability" a requirement, their slow PRNG becomes very attractive. In other words, they made up a problem for their solution.
If you need a statistical PRNG, use a statistical PRNG. (The best are xorshift1024* and xorshift128+.) If you need a CSPRNG, use a CSPRNG.
An interesting case would be something like hash flooding prevention. The cost of a full CSPRNG is probably not worth it--no security is broken if we lose the secrecy of one generated seed--but at the same time we want something that is at least nontrivial for an attacker to predict.
I agree that most cases you either don't care at all or need a real CSPRNG.
What does the "zero will be produced once less often than every other output" for negative qualities of XorShift and XorShift* means? That they are not able to generate "0"?
Also, does anyone know if PCG is in use somewhere today?
I mentored a gsoc student this summer who worked on rngs, and the only two algorithms that passed the big crush statistical suite were pcg random and split mix.
> No facility for a user-provided seed, preventing programs from getting reproducible results
> Periodically “stirs” the generator using kernel-provided entropy; this code must be removed if reproducible results desired (in testing its speed, I deleted this code)
seem like exactly the kinds of foot guns you really want removed from an RNG you're using for real live code.
Sometimes you want well-mixed numbers, but in a sequence that you can rerun. If you are simulating a process or testing games, having a fully deterministic setup if you know a seed is a great help to reproduce unusual behavior sighted.
Reproducibility and seed-ability isn't a footgun unless it's easy to do wrong accidentally, or hard to do right accidentally.
The former really isn't a problem except for really poor APIs, and the later isn't made worse by it... although the PCG random author does like to make a point of just how bad it is in C++ (see [1]).
Having encountered a number of VPS setups relying on haveged or rng-tools/virtio-rng, has anyone observed "specific problems" with the misconfiguration/misuse of haveged on VPS?
Not quite that, but here's someone's approximation of this approach for an extremely fast hash:
So this was a fun hash that I came up with when I was looking
for a 128-bit hash for falkervisor's code coverage database.
This hash passes all of smhasher's tests, faster than any other
hash that does the same.
I made this up by randomly testing things until it passed
all of smhashers tests. It is likely that this hash is heavily
flawed and smhasher just doesn't stress its flaws.
Zilong Tan's fast-hash contains a generator which
genetically optimizes over a combination of the cheap ops mul, xorshift left, xorshift right and rotate right, and also does genetic mutations over the initial magic states.
[+] [-] tptacek|10 years ago|reply
[+] [-] pbsd|10 years ago|reply
Note that the field of pseudorandom number generation for non-cryptographic purposes is much less rigorous than cryptography. Typically, for a new generator to be accepted as "OK" all it needs to do is pass a number of fixed statistical tests, usually one of the TestU01 batteries [1]. This is usually the only falsifiable claim you get, and if you're familiar with diffusion and how to achieve it this is easy to work out. Other falsifiable claims include equidistribution, but that is not a very useful guarantee---simply incrementing by 1 will achieve that. The notion of indistinguishability against computationally-bounded adversaries does not exist. All of this contributes to this being a field where crackpottery abounds, and is hard to distinguish good from bad.
For example, here's a generator that uses 1/4 of an AES round per output word, and that passes the Crush battery of statistical tests (which uses ~2^35 samples). Is it a good generator? I would not bet on it.
[1] http://simul.iro.umontreal.ca/testu01/tu01.html[+] [-] Veedrac|10 years ago|reply
Of course, PCG doesn't exist to secure secrets. That's for cryptographic primitives, not any old RNG. However, there are downsides for using trivially predictable RNGs. A randomized algorithm might degrade performance with carefully-crafted inputs. This may expose you to DOS attacks. Hashes are a better known issue, where several languages have moved to make hashing less predictable, and thus less game-able.
It would surely be better for an implementation to avoid exposing any of this altogether, but an RNG intended for general use should have the worst-case scenario in mind.
The reason not to go full-on cryptographic is simply that the alternatives are faster and have more features. A list of some of the cool stuff PCG has is on the front-page, but here goes:
It's extremely fast, pretty much the best non-cryptographic RNG out there distribution wise, of those tested, and a tiny dependency. The RNG is reproducible and supports multiple streams, including streams dependent on the address space. This also allows emulating Haskell's RNG splitting (where one RNG creates two new independent ones), which is neat if you don't want to share state between multiple parts of your program but still want global reproducibility, or even if you're just a functional programmer. You can skip it backwards and forwards, which is little used but a ton of help when you do want it. Each instantiated RNG is a few bytes big. Something like the Mersenne Twister is a bad thing to instantiate wildly, especially on memory-constrained systems. Not a massive issue, but it's still silly any RNG fails at this. Last but certainly not least, it's a tiny dependency.
[+] [-] DanBC|10 years ago|reply
There's a bit of discussion here: https://news.ycombinator.com/item?id=9887548
Slavik81 in that thread mentions raytracing as an example use.
[+] [-] eliteraspberrie|10 years ago|reply
But a good CSPRNG has excellent statistical properties. So if you make "unpredictability" a requirement, their slow PRNG becomes very attractive. In other words, they made up a problem for their solution.
If you need a statistical PRNG, use a statistical PRNG. (The best are xorshift1024* and xorshift128+.) If you need a CSPRNG, use a CSPRNG.
[+] [-] ahh|10 years ago|reply
I agree that most cases you either don't care at all or need a real CSPRNG.
[+] [-] dminor|10 years ago|reply
[+] [-] pettou|10 years ago|reply
Also, does anyone know if PCG is in use somewhere today?
[+] [-] dminor|10 years ago|reply
[+] [-] thesz|10 years ago|reply
Not mentioning speed variability of ChaCha family is a flaw in analysis.
[+] [-] carterschonwald|10 years ago|reply
[+] [-] cwmma|10 years ago|reply
> No facility for a user-provided seed, preventing programs from getting reproducible results
> Periodically “stirs” the generator using kernel-provided entropy; this code must be removed if reproducible results desired (in testing its speed, I deleted this code)
seem like exactly the kinds of foot guns you really want removed from an RNG you're using for real live code.
[+] [-] kryptiskt|10 years ago|reply
[+] [-] DanBC|10 years ago|reply
[+] [-] tedunangst|10 years ago|reply
[+] [-] Veedrac|10 years ago|reply
The former really isn't a problem except for really poor APIs, and the later isn't made worse by it... although the PCG random author does like to make a point of just how bad it is in C++ (see [1]).
[1] https://www.reddit.com/r/cpp/comments/31857s/random_number_g...
[+] [-] thrownaway2424|10 years ago|reply
[+] [-] justcommenting|10 years ago|reply
[+] [-] cpeterso|10 years ago|reply
[+] [-] nkurz|10 years ago|reply
[+] [-] rurban|10 years ago|reply
https://code.google.com/p/fast-hash/source/browse/trunk/hash...
The fit function is only the avalanche test, but that's easily exchangable.