Specific Problems with Other RNGs

[+] tptacek|10 years ago|reply

I went back to the root page of this site to see what PCG is, and I don't understand it. What's the point of an RNG that is "more secure" than an insecure RNG, but less secure than a real CSPRNG? What does "Predictability: Challenging" actually mean?

[+] pbsd|10 years ago|reply

It means exactly what it sounds like: the author tried to attack the generator, failed, and thus concluded that it must be a Very Hard Problem.

Note that the field of pseudorandom number generation for non-cryptographic purposes is much less rigorous than cryptography. Typically, for a new generator to be accepted as "OK" all it needs to do is pass a number of fixed statistical tests, usually one of the TestU01 batteries [1]. This is usually the only falsifiable claim you get, and if you're familiar with diffusion and how to achieve it this is easy to work out. Other falsifiable claims include equidistribution, but that is not a very useful guarantee---simply incrementing by 1 will achieve that. The notion of indistinguishability against computationally-bounded adversaries does not exist. All of this contributes to this being a field where crackpottery abounds, and is hard to distinguish good from bad.

For example, here's a generator that uses 1/4 of an AES round per output word, and that passes the Crush battery of statistical tests (which uses ~2^35 samples). Is it a good generator? I would not bet on it.

  #include <stdint.h>
  #include <immintrin.h>

  struct S {
    static const unsigned kAesRounds = 1;
    union {
      uint32_t state_[4 * 4];
      __m128i  words_[4];
    } u_;
    unsigned counter_;
    const __m128i key_;

    S(uint64_t seed)
    : u_{{0}}, key_{_mm_set_epi64x(seed, -seed)}, counter_{0} {
      for(unsigned i = 0; i < 16; ++i) (void)next();
    }

    uint32_t next() {
      const uint32_t output = u_.state_[counter_];
      __m128i t = u_.words_[counter_ & 3];
      counter_ = (counter_ + 1) & 15;
      for(unsigned i = 0; i < kAesRounds; ++i)
        t = _mm_aesenc_si128(t, key_);
      u_.words_[counter_ & 3] = t;
      return output;
    }
  };

[1] http://simul.iro.umontreal.ca/testu01/tu01.html

[+] Veedrac|10 years ago|reply

That PCG is ostensibly less predictable is not a prime selling point, more a small nice-to-have in the scheme of things.

Of course, PCG doesn't exist to secure secrets. That's for cryptographic primitives, not any old RNG. However, there are downsides for using trivially predictable RNGs. A randomized algorithm might degrade performance with carefully-crafted inputs. This may expose you to DOS attacks. Hashes are a better known issue, where several languages have moved to make hashing less predictable, and thus less game-able.

It would surely be better for an implementation to avoid exposing any of this altogether, but an RNG intended for general use should have the worst-case scenario in mind.

The reason not to go full-on cryptographic is simply that the alternatives are faster and have more features. A list of some of the cool stuff PCG has is on the front-page, but here goes:

It's extremely fast, pretty much the best non-cryptographic RNG out there distribution wise, of those tested, and a tiny dependency. The RNG is reproducible and supports multiple streams, including streams dependent on the address space. This also allows emulating Haskell's RNG splitting (where one RNG creates two new independent ones), which is neat if you don't want to share state between multiple parts of your program but still want global reproducibility, or even if you're just a functional programmer. You can skip it backwards and forwards, which is little used but a ton of help when you do want it. Each instantiated RNG is a few bytes big. Something like the Mersenne Twister is a bad thing to instantiate wildly, especially on memory-constrained systems. Not a massive issue, but it's still silly any RNG fails at this. Last but certainly not least, it's a tiny dependency.

[+] DanBC|10 years ago|reply

It's definitely not a cryptographically secure PRN. I suspect it's used for people who want a lot of random data for modelling.

There's a bit of discussion here: https://news.ycombinator.com/item?id=9887548

Slavik81 in that thread mentions raytracing as an example use.

[+] eliteraspberrie|10 years ago|reply

Only cryptographers care about "unpredictability." Outside of cryptography, the interesting properties of a PRNG are speed, period length, and state size.

But a good CSPRNG has excellent statistical properties. So if you make "unpredictability" a requirement, their slow PRNG becomes very attractive. In other words, they made up a problem for their solution.

If you need a statistical PRNG, use a statistical PRNG. (The best are xorshift1024* and xorshift128+.) If you need a CSPRNG, use a CSPRNG.

[+] ahh|10 years ago|reply

An interesting case would be something like hash flooding prevention. The cost of a full CSPRNG is probably not worth it--no security is broken if we lose the secrecy of one generated seed--but at the same time we want something that is at least nontrivial for an attacker to predict.

I agree that most cases you either don't care at all or need a real CSPRNG.

[+] dminor|10 years ago|reply

I believe it means it's more difficult to guess it's internal state than other commonly used PRNGs.

[+] pettou|10 years ago|reply

What does the "zero will be produced once less often than every other output" for negative qualities of XorShift and XorShift* means? That they are not able to generate "0"?

Also, does anyone know if PCG is in use somewhere today?

[+] dminor|10 years ago|reply

I believe go has switched to it or its considering switching. Someone linked to a code review awhile back.

[+] thesz|10 years ago|reply

ChaCha20 is (relatively) slow, but! ChaCha8 does not have known attacks and 2.5 times faster than ChaCha20.

Not mentioning speed variability of ChaCha family is a flaw in analysis.

[+] carterschonwald|10 years ago|reply

I mentored a gsoc student this summer who worked on rngs, and the only two algorithms that passed the big crush statistical suite were pcg random and split mix.

[+] cwmma|10 years ago|reply

the knocks against the openbsd arc4random, namely

> No facility for a user-provided seed, preventing programs from getting reproducible results

> Periodically “stirs” the generator using kernel-provided entropy; this code must be removed if reproducible results desired (in testing its speed, I deleted this code)

seem like exactly the kinds of foot guns you really want removed from an RNG you're using for real live code.

[+] kryptiskt|10 years ago|reply

Sometimes you want well-mixed numbers, but in a sequence that you can rerun. If you are simulating a process or testing games, having a fully deterministic setup if you know a seed is a great help to reproduce unusual behavior sighted.

[+] DanBC|10 years ago|reply

Some people need reproducible "randomness". They're testing a model, for example. I'm not sure if this RNG is good for them or not.

[+] tedunangst|10 years ago|reply

Some people seem forever confused about the difference between random and deterministic.

[+] Veedrac|10 years ago|reply

Reproducibility and seed-ability isn't a footgun unless it's easy to do wrong accidentally, or hard to do right accidentally.

The former really isn't a problem except for really poor APIs, and the later isn't made worse by it... although the PCG random author does like to make a point of just how bad it is in C++ (see [1]).

[1] https://www.reddit.com/r/cpp/comments/31857s/random_number_g...

[+] thrownaway2424|10 years ago|reply

What about unit tests?

[+] justcommenting|10 years ago|reply

Having encountered a number of VPS setups relying on haveged or rng-tools/virtio-rng, has anyone observed "specific problems" with the misconfiguration/misuse of haveged on VPS?

[+] cpeterso|10 years ago|reply

Has there been any research into generating a PRNG using a genetic algorithm whose fitness function is the Crush and Diehard test results?

[+] nkurz|10 years ago|reply

Not quite that, but here's someone's approximation of this approach for an extremely fast hash:

  So this was a fun hash that I came up with when I was looking
  for a 128-bit hash for falkervisor's code coverage database.
  This hash passes all of smhasher's tests, faster than any other
  hash that does the same.

  I made this up by randomly testing things until it passed
  all of smhashers tests. It is likely that this hash is heavily
  flawed and smhasher just doesn't stress its flaws.

https://github.com/gamozolabs/falkhash

[+] rurban|10 years ago|reply

Zilong Tan's fast-hash contains a generator which genetically optimizes over a combination of the cheap ops mul, xorshift left, xorshift right and rotate right, and also does genetic mutations over the initial magic states.

https://code.google.com/p/fast-hash/source/browse/trunk/hash...

The fit function is only the avalanche test, but that's easily exchangable.

38 comments