How to test a random number generator (2010)

[+] Veedrac|8 years ago|reply

One of the most important things that came out of The PCG Paper[1] is that tests are stronger as you reduce the number of state bits, and measuring this continuum is much more powerful than a single test. (see Figure 7)

It seems to me that any new non-cryptographic PRNG should be required to test itself in this manner, or else provide a very good excuse.

[1] http://www.pcg-random.org/paper.html

[+] Veedrac|8 years ago|reply

Addendum: For what it's worth, it seems some people aren't fond of Diehard any more. To quote Chris Doty-Humphrey, the author of PractRand:

> In empirical testing, a classic is the Diehard program by Marsaglia. That program is badly outdated today and should not be used anymore, but it was, so far as I know, the first real standardized battery of such tests.

More information is available on their website: http://pracrand.sourceforge.net/PractRand.txt.

[+] clarry|8 years ago|reply

PCG seems to be designed by intuition and experimentation, with practically no theory in its support. I'm not saying it's bad, and I certainly see some potential merit in the idea that a PRNG passing tests with fewer bits is likely more effective than one that needs more state.

The problem is that these test suites are not very comprehensive and while they're OK for discerning an obviously broken RNG, they're not good to vouch for an RNG in general. That is to say, it is fairly easy to write an RNG that passes these test suites despite having some very predictable and easily detectable patterns that you'd find if only you wrote a test that looks for that particular pattern. Kinda like these suites have hyperplane tests to detect LCGs.

There's an old saying about anyone being smart enough to invent crypto they are not smart enough to break themselves. As far as designing RNGs goes, doing one against existing test suites is pretty low bar. I think we can afford a little more theory backing non-cryptographic PRNGs too.

I wouldn't give these test suites so much credit; they're tools for detecting known flaws or broken implementations. That's all.

[+] Someone|8 years ago|reply

I think you’re better of reading the relevant chapter in TAOCP. It has more detail, and this doesn’t add anything from past 1970 or so. For example, it doesn’t even _mention_ the diehard and dieharder tests, let alone discuss their merits. (CORRECTION: it does. Thanks to marvy for pointing that out)

Also, I think the final line in this text (”Correct generators usually pass tests, and buggy generators usually fail.”) is at least dangerously misleading.

Firstly, I would use good/bad, not correct/buggy, but that is minor. My major issue is the idea that bad generators usually fail. Only really bad generators usually fail. Somewhat bad generators fail tests less often, and only slightly bad generators can pass many, many tests, only to show their weakness in some particular setting.

For example, if I take the output r1,r2,r3… of a good generator and swap r1 and r2 if r1>r2, r3 and r4 if r3>r4, etc. the result isn’t truly random, but will pass many tests.

If you suspect your generator to have this problem, it is extremely easy to test for it, but if you don’t, that may be hard.

A good array of tests will test subsequences, and find this easily (the average of the odd samples will typically be less than expected, and the average of the even samples will typically be larger)

[+] gamegoblin|8 years ago|reply

Your example will fail the "Overlapping permutations" test and "Runs test" from the diehard tests.

I don't think it's really meaningful to point out that a bad RNG will pass many of the constituent tests of a particular test battery (e.g. diehard), because each constituent test is generally looking for one particular property.

[+] brians|8 years ago|reply

You think a length two repeat in the most significant bits will hide from statistical tests? That doesn’t sound plausible to me. Can you show a good RNG for which this mutation passes tests?

[+] marvy|8 years ago|reply

> For example, it doesn’t even _mention_ the diehard and dieharder tests, let alone discuss their merits

It mentions the DIEHARD test near the beginning. Try ctrl-F.

[+] pletnes|8 years ago|reply

This was disappointing. Not a word about correlations between subsequent numbers, period length, all the things that matter if you’re actually generating significant amounts of random numbers.

[+] pvg|8 years ago|reply

The chapter is not about writing and then testing new PRNGs. A uniform PRNG of sufficient quality for one's purpose is assumed. The key bit:

"Most testers will be concerned with the second step of the grand strategy: testing code that transforms uniform random sequences into sequences with other distributions. Although very few people develop their own uniform generators, many people have reasons to develop custom nonuniform generators."

[+] bhickey|8 years ago|reply

Since the discussion seems to have already degenerated into waxing poetic about this RNG or that I'd like to throw in: Do not use pseudo-random number generators. Use cryptographically secure random number generators.

From a performance perspective there are vanishingly few applications where the difference between MT or PCG and AES-CTR is going to make an ounce of difference. Once the question of performance is settled, there simply don't remain any arguments in favor of using PRNGs.

[+] FabHK|8 years ago|reply

Monte Carlo integration, one of the most important application of PRNG, needs the speed of non-CSPRNG.

However, I do agree that programming languages these day should probably make a CSPRNG the default, and let you switch to a fast PRNG when needed. (Now it's by and large the other way around – you get a fast PRNG and need to switch to a secure one.)

If you know what you're doing, the outcome is the same, but if you don't, in the first case you're unnecessarily slow, while in the second you're unnecessarily insecure.

[+] tomatsu|8 years ago|reply

https://github.com/mohae/bench-rng

csprng: 909 ns/Op

Xorhift128+: 2 ns/Op

Xorhift128+ is what all major JS engines use for Math.random().

I don't think it's a good idea if you use a CSPRNG if there isn't any reason for that. Especially if you need a lot of random numbers. E.g. a game might drop some additional frames here and there and your particle effects won't look any better and your AI won't behave any smarter either.

Not to use PRNGs ever is bad advice.

[+] AlexCoventry|8 years ago|reply

Any good CUDA implementations and benchmarks for CSPRNG's?

[+] aerioux|8 years ago|reply

2/10 would not recommend for technical merit.

8/10 is nice if you want a handwavy human explanation of some rudimentary expectations in crypto

- author spends a few pages thinking about randomness - we have formal definitions of a RNG/PRNG.

- author does basic statistical tests, these seem very unconcerned with actual randomness - i'm not sure what the point is

- author really shouldn't be writing about crypto randomness if they don't do crypto

+1 to higher comment saying reading TAOCP instead.

[+] krab|8 years ago|reply

The author specifically mentions he doesn't write about cryptography.

> The idea of what constitutes a good RNG depends on how the RNG is applied. That is why, for example, a generator may be considered high-quality for simulation while being considered unacceptable for cryptography. This chapter looks at tests of statistical quality and does not address tests for cryptographical security.

[+] pvg|8 years ago|reply

He’s writing with the assumption you have a sufficiently good uniform PRNG. Then explains how to test whether the code you wrote to achieve a particular distribution does what you think it does.

[+] Fomite|8 years ago|reply

"i'm not sure what the point is" - people use random numbers for things besides crypto.

[+] bmh_ca|8 years ago|reply

This is a great read. The principles are also applicable to other problems such as evaluating a constant time comparison function, symmetrical cypher, or hash.

[+] olegkikin|8 years ago|reply

So is this a good PRNG?

    f(i) = SHA256(i + salt)

[+] __s|8 years ago|reply

Discussion: https://crypto.stackexchange.com/questions/9076/using-a-hash...

& http://xoroshiro.di.unimi.it makes reference to how on x86 cpus with AES one could get a very fast prng similar to how you're implying

As a change of 1 bit in an input ideally flips half the bits of the output for a cryptographic hash, this scheme should work

[+] marvy|8 years ago|reply

I don't know much about this stuff, but I think that this would be considered overkill (and thus too slow) if you don't need crypto-level randomness. But other than that it should be really good.

[+] grenoire|8 years ago|reply

Random, sure. Uniformly distributed? Not sure.

[+] pvg|8 years ago|reply

A PRNG usually doesn’t have parameters once seeded. What’s i in your scheme?

If you’re asking if you can make a good PRNG out of cryptographic primitives, yes, that’s what happens in the bowels of OS CSPRNGs. But a bit more involved.

[+] Javantea_|8 years ago|reply

No. A feature of a good PRNG is that it should not be predictable. If the salt is weak enough to be cracked, that PRNG can be reversed and predicted.

[+] known|8 years ago|reply

https://en.wikipedia.org/wiki/Diehard_tests

[+] FabHK|8 years ago|reply

Marsaglia's DieHard, DieHarder, and NIST 800-22 A are historically important, but outdated.

See my related stack overflow answer:

https://stackoverflow.com/questions/4720822/best-pseudo-rand...

[+] clarry|8 years ago|reply

They suck. At least use TestU01 if you're going to work with an existing test suite. It's still very easy to write a (bad) prng that passes all the testu01 batteries.

[+] TekMol|8 years ago|reply

    Software random number generators are technically
    pseudorandom number generators because the output
    of a deterministic program cannot really be random.

On the other hand, everything we build is made of quantum particles which exhibit truly random behavior. Including the computers that run those programs. So programs cannot really be deterministic.

[+] pvg|8 years ago|reply

That's an argument that's persuasive in very specialized regions of spacetime with particular chemistries. Like dorm rooms equipped with bongs.

[+] adrianN|8 years ago|reply

Programs are abstract objects like numbers. The computers executing them are indeed not deterministic (e.g. random bit flips occur occasionally), but pretty smart people do their best to make them appear so, so it's unlikely that you see behavior that is not explainable by a deterministic model.

42 comments