It's a good introductory write up, but there are 3 things about it that frustrated me to no end.
> "As a human, I can do this very easily. 100101011010010110001101 There, I just did it.
No, you didn't. That number is most certainly not random, there are biases in it, you just don't know there are. Your mind is not random. This is why we use dice and not people to generate random bits.
> What do you mean by “kind of random number”? Aren’t all random numbers the same. Not really. There are two primary types of random number generators.
I growled audibly at this one. Random numbers should all be the same. In quality, not quantity, of course. There should be no discernible difference. There is one kind of random number, only different types of generators, with a PRNG if the seed is provably destroyed there should be absolutely no way to distinguish between a number generated by a PRNG and a TRNG.
> The computer hardware isn’t the only source of entropy. The user’s own mouse and keyboard movements can be used as well.
No. These movements are not random. Similar to my first gripe, your brain is not random. We used to use this approach to generate entropy and now we don't because this is understood, "random" user inputs should absolutely never be used to generate randomness in anything security related.
> Despite the benefits of CSPRNGs, as with everything else in the tech industry, security can never be guaranteed.
I'm glad you pointed this out. Everything in cryptography is based on unproven axioms, this is an important point that people should understand, it could be that P=NP, it could be that an algorithm exists to factor numbers to primes, we think not but really we don't know.
> That number is most certainly not random, there are biases in it, you just don't know there are.
Indeed. In that specific example, the author generated 24 binary digits, but only on 7 occasions was the digit the same as the previous digit. There were 23 opportunities to do so, and in a truly random sequence of digits the chance of any digit being the same as the previous one is 50%.
Applying the binomial distribution, the probability of observing 7 or fewer repeats with 23 trials and a 50% chance on each trial is only 4%. So with p = 0.04, we can reject the null hypothesis that the author's generated sequence is random, and conclude that it is biased.
Yes I'm aware that there's an ongoing debate about whether a human, or anything for that matter, can truly be random. But I wasn't going to bring this up in this article, or add caveats to every single point, or else this would've been unbearable to read.
I used artistic license in certain instances. But I agree some things could've been worded better and I'll edit those.
> it could be that P=NP, it could be that an algorithm exists to factor numbers to primes
Nits but 1) even if P=NP, it is possible that the problems we've chosen have a lower bound on their polynomial-time algorithms that still leaves them secure for the indefinite future (O(n^Graham's Number) is polynomial). Of course, we also don't have any proof of that so your general point stands. Related, 2) an algorithm certainly exists to find prime factors - quite a few actually! - but none that we know of in polynomial time without quantum computing.
>No, you didn't. That number is most certainly not random, there are biases in it, you just don't know there are.
Isn't that true of CSPRNGs as well, though? One of them may have some huge flaw that allows it to be predicted, we just don't know what it is yet.
(I think you basically agree with this in your last paragraph?)
More generally, when we ask for a random number, what we're usually looking for is a number uncorrelated with some other number. For example, when you're doing an experiment, the purpose of the randomization is to ensure that you're not really capturing some other effect that was setting which parameters were placed together.
That is, if you're deciding to run the control vs test case because of subtle fluctuations in the room's lighting (that you're unaware are influencing you), then you may have no effect at all, but you accidentally pick up an effect which was really due to the lighting, not to the thing you're trying to test. Using a dice roll prevents this, not because of randomization per se, but because it reliably breaks other correlations that might seep into the experiment.
To the extent that a human randomizer meets that in one particular domain, it may be good enough (not that you were saying otherwise). It's just ... you'd rather not bet on that being true in any given case.
I don't know what is actually done, but I would think you can do things like milliseconds between keystrokes, or distance mouse moved modulo 2 to generate randomness. That said I agree with your point humans are not random.
> No, you didn't. That number is most certainly not random, there are biases in it, you just don't know there are. Your mind is not random. This is why we use dice and not people to generate random bits.
Can't you say that any single (shortish) number is random? How you can demonstrate 99999999999999999999 isn't random?
Writing your own random number generator can be a lot of fun. Lots of sources of entropy out there. Inspired by lavarand[0], I wrote an RNG in Python based on the output of the Global Conscousness Dot[1] (which is a ridiculous project in its own right). There's a lot of ways to visualize and ascertain how "random" your numbers are as well, whether plotting Pearson's with matplotlib or using a command line tool like ent[2] to calculate the degree of entropy.
> The identification of events and the times at which they occur are specified case by case, as are the statistical recipes. The approach explicitly preserves some latitude of choice, as is appropriate for an experiment exploring new territory. Accepting loose criteria for event identification allows exploration of a variety of categories, while the specification of a rigorous, simple hypothesis test for each event in the formal series assures valid statistics.
I’ve never seen someone so blatantly spell out that they are cherry picking, but also then argue that the cherry picking is good science ;) this is the sort of thing that gives real scientists a bad name.
That's a good call out and I realize that was an error on my part. I meant to put in the Mersenne Twister but when I went back through my notes when I was writing, I misread that linear congruential generators were popular decades ago.
I've been working with the NIST 800-22 evaluation suite [1] and yes, it is very hard. There are an infinite number of tests to determine if a number is random. Ultimately it comes down to probability that it is a good RNG/PRNG and the application. Ironically, if a PRNG is good enough, it can often be BETTER than a random source (which is why RNGs have a conditioning phase).
An infinite number of tests yes, it is like trying to find the best possible predictor of a source. No statistical test can prove that a source has full entropy and uniformly distributed, only disprove it. These tests are too often misused and are usually not very useful since they tell us nothing about how much entropy might be in the sample, and entropy sources are always biased anyway and fail the tests, and if you condition them then they will always pass the test even if not random at all. A better method is to develop a stochastic model of the entropy source and attempt to estimate bounds on the entropy, which is still generally not feasible. The statistical tests can then be used as a sanity check to verify your entropy estimates.
The RNG needs a conditioning phase, or entropy extractor to transform the output into a distribution that is indistinguishable from uniform as is needed in cryptographic applications and you should always include this stage because virtually no physical entropy source has a uniform distribution.
The best thing to do is to try to gather as much entropy as you can from sources and gather maybe 10x what you think you need and then put it through an entropy extractor like a cryptographic hash function to generate a PRNG seed then use the PRNG.
An undergraduate algorithms + statistics class is sufficient to get a lot out of this book, even if it's just exposure to the wide variety of techniques for generating random numbers on a computer.
NOT truly random. Way back in the late 70s, I had a BASIC program that used to output the very same set of 8-digit 'random' numbers every time the program was used.
(Unless you did a special thing the first time, to obtain a random seed to generate a different sequence of pseudo-random numbers. IIRC, it was the number of machine cycles since the last time the floppy disk was accessed.)
Hardware:
Truly random: The machine counts machine cycles with a small maximum number before it overflows and restarts at zero. The machine looks at the time difference between things like key presses, or disk-spins, or something else that varies in time.
No matter how good you are the variation in time between your key-presses is never the same at the very small time-flow level. That number is used as a seed to a pseudo-random generator.
A modern "pseudo random number" is simply a sequence of numbers that visits the state-space in an order that's difficult to detect with modern statistical tests. (Chi-squared, among others). See PractRand or TestU01 as two packages for testing these sequences. That is to say: instead of the sequence "0, 1, 2, 3, 4, 5... 4294967296... 0", your RNG will do some other sequence.
Yes, "0, 3, 6, 9... 4294967295, 2, 5, 8... 4294967294, 1, 4, 7... 4294967293, 0, 3..." is a RNG of sorts, but an example of a really, really bad one that would instantly fail most statistical tests. :-) But conceptually, the Mersenne Twister, LCGRNG, and LSFR all accomplish this. Its a "reordering" of the sequence. That's it.
A "cryptographic random number" just adds a few additional tests to the pool of statistical tests. In particular: differential cryptography gets into the nitty gritty about which bits can predict the results of other bits. You have to assume that the "opponent" is willing to use extraordinary amounts of computing power to detect patterns.
If bit#25 has a 51% correlation with bit#30, you fail cryptographic random numbers. You need to be within 2^128 (at least) worth of security or more. That means a near 50% correlation (maybe 50.00000000001% is fine) between bits and future bits.
For example: the sequence: {AES(0, key), AES(1, key), AES(2, key)... AES(2^128, key), AES(0, key)...} is a cryptographically secure random number generator. The sequence will loop after its 128 bits of state are exhausted. If the "opponent" doesn't know the key, the bitwise correlations are cryptographically sound (thanks to the hard work of the engineers behind AES).
A true random number generator is just a cryptographic number generator applied to a truly random seed. White noise generators are a well known electronic-engineer trick: resistor noise is everywhere but is rather small (but you can build a white-noise generator from Johnson Nyquist noise if you really wanted). More likely, you use shot-noise from a transistor junction, at least at the hobbyist level.
Intel / AMD have true random number generators from some kind of electrical noise generator on every CPU, which feeds into cryptographic random number generators.
There are other sources of noise: radiation is a well known one but I'm not sure if they're practical.
There's a "speed limit" to white noise generators. You only can extract so much entropy from them in a given time (ex: Remember: CPUs operate at 4GHz, or 0.25 nanoseconds per clock tick). Cryptographic random number generators "stretch" the true seed of randomness, while the white-noise generator continues to grab more entropy.
Technically a PRNG state sequence could also have distinct loops, returning a reordering of only a subset of the sequence for any given seed. For example, `return state += 2;` would return two separate sequences for odd or even seeds. We know that a PRNG with N bits of state must necessarily repeat after 2^N invocations, but it requires further proof that a given PRNG will not repeat at smaller intervals.
> For example: the sequence: {AES(0, key), AES(1, key), AES(2, key)... AES(2^128, key), AES(0, key)...} is a cryptographically secure random number generator. The sequence will loop after its 128 bits of state are exhausted.
In light of the above, is it possible to prove this (through some property of AES)?
(Also, we can't really prove that AES, or any other current algorithm, is cryptographically secure. Tomorrow, a new cryptanalysis technique could be revealed which breaks it.)
PRNGs can actually be better than true RNGs, which is why RNGs generally provide a seed for a DRBG. RNG conditioning has to do things like remove extremely long series of 1's or 0's, and RNGs often cannot generate enough material fast enough for today's demands (e.g, <1Kbps).
I'm trying to remember a technique based on listening to static or whitenoise and taking the least significant bit. You then take that stream of bits and postprocess it by looking for bit changes and don't use them directly.
That's a vague and I'm sure inaccurate description, but it isn't enough for me to google it ... anyone know what I'm referring to?
There's much cheaper, and easier, sources of white-noise.
Anyone actually interested in the electronics of this should build a white-noise generator out of an Op-Amp + your favorite PN junction in reverse-bias mode (diode, BJT transistor, or whatever).
Shot-noise from reverse-bias'd PN junctions is white noise at a quantum level. You're physically seeing the random electrons move across a junction that wasn't supposed to happen, and then amplifying those electrons up to levels we can detect (well... not our fingers to detect. But a fancy op-amp amplifier + arduino can detect).
EDIT: This circuit from Maxim is reverse-breakdown noise from a Zener diode, which is more vigorous than shot-noise, and therefore easier to amplify. Its still white-noise and therefore "Truly random" up to the MHz. The circuit uses a Maxim voltage amplifier (I mean, the article is a big advertisement for how simple the MAX2650 is to use...)
I hope you were allowed to use some sort of reference material. Unless your previous work involved intimate familiarity with PRNGs, I would never expect anybody to be able to implement one on the spot, off the top of their heads. Even with reference material, I don't think I'd expect anything more involved than a linear congruential generator.
Without reference material, probably the best I could do would be to read from one of the /dev/*random devices. I'm guessing that whatever the actual intention behind that question is that writing a function that essentially reads from a file doesn't match the expectation of the interviewer. :/
It's preferred to use RDSEED and use that as a seed, instead of implicitly trusting the output of RDRAND. Mix it with user entropy (or system entropy).
Not sure the security implications of using poisoned seeds in addition to truly random seeds in an RNG. Some say that it's okay, because it cannot reduce security, only not increase it. But if the CPU RDRAND instruction is backdoored, couldn't the RNG instructions be intercepted and replaced so that RDRAND is the only seed? But, if your CPU is backdoored, why even bother with anything? etc, etc etc. This discussion could go on for a while.
[+] [-] betwixthewires|4 years ago|reply
> "As a human, I can do this very easily. 100101011010010110001101 There, I just did it.
No, you didn't. That number is most certainly not random, there are biases in it, you just don't know there are. Your mind is not random. This is why we use dice and not people to generate random bits.
> What do you mean by “kind of random number”? Aren’t all random numbers the same. Not really. There are two primary types of random number generators.
I growled audibly at this one. Random numbers should all be the same. In quality, not quantity, of course. There should be no discernible difference. There is one kind of random number, only different types of generators, with a PRNG if the seed is provably destroyed there should be absolutely no way to distinguish between a number generated by a PRNG and a TRNG.
> The computer hardware isn’t the only source of entropy. The user’s own mouse and keyboard movements can be used as well.
No. These movements are not random. Similar to my first gripe, your brain is not random. We used to use this approach to generate entropy and now we don't because this is understood, "random" user inputs should absolutely never be used to generate randomness in anything security related.
> Despite the benefits of CSPRNGs, as with everything else in the tech industry, security can never be guaranteed.
I'm glad you pointed this out. Everything in cryptography is based on unproven axioms, this is an important point that people should understand, it could be that P=NP, it could be that an algorithm exists to factor numbers to primes, we think not but really we don't know.
[+] [-] Jerrrry|4 years ago|reply
It is crucial to inform the user of this, as software has.
"Generating random numbers, please move your mouse around randomly" is suitable for literally all use cases.
[+] [-] caylus|4 years ago|reply
Indeed. In that specific example, the author generated 24 binary digits, but only on 7 occasions was the digit the same as the previous digit. There were 23 opportunities to do so, and in a truly random sequence of digits the chance of any digit being the same as the previous one is 50%.
Applying the binomial distribution, the probability of observing 7 or fewer repeats with 23 trials and a 50% chance on each trial is only 4%. So with p = 0.04, we can reject the null hypothesis that the author's generated sequence is random, and conclude that it is biased.
There are online demos where you can try this yourself: https://people.ischool.berkeley.edu/~nick/aaronson-oracle/in...
[+] [-] sunny--tech|4 years ago|reply
I used artistic license in certain instances. But I agree some things could've been worded better and I'll edit those.
[+] [-] dllthomas|4 years ago|reply
Nits but 1) even if P=NP, it is possible that the problems we've chosen have a lower bound on their polynomial-time algorithms that still leaves them secure for the indefinite future (O(n^Graham's Number) is polynomial). Of course, we also don't have any proof of that so your general point stands. Related, 2) an algorithm certainly exists to find prime factors - quite a few actually! - but none that we know of in polynomial time without quantum computing.
[+] [-] SilasX|4 years ago|reply
Isn't that true of CSPRNGs as well, though? One of them may have some huge flaw that allows it to be predicted, we just don't know what it is yet.
(I think you basically agree with this in your last paragraph?)
More generally, when we ask for a random number, what we're usually looking for is a number uncorrelated with some other number. For example, when you're doing an experiment, the purpose of the randomization is to ensure that you're not really capturing some other effect that was setting which parameters were placed together.
That is, if you're deciding to run the control vs test case because of subtle fluctuations in the room's lighting (that you're unaware are influencing you), then you may have no effect at all, but you accidentally pick up an effect which was really due to the lighting, not to the thing you're trying to test. Using a dice roll prevents this, not because of randomization per se, but because it reliably breaks other correlations that might seep into the experiment.
To the extent that a human randomizer meets that in one particular domain, it may be good enough (not that you were saying otherwise). It's just ... you'd rather not bet on that being true in any given case.
[+] [-] mindvirus|4 years ago|reply
[+] [-] catlifeonmars|4 years ago|reply
[+] [-] postalrat|4 years ago|reply
Can't you say that any single (shortish) number is random? How you can demonstrate 99999999999999999999 isn't random?
[+] [-] xkeysc0re|4 years ago|reply
[0] https://en.wikipedia.org/wiki/Lavarand
[1] https://gcpdot.com/
[2] https://manpages.ubuntu.com/manpages/bionic/man1/ent.1.html
[+] [-] atty|4 years ago|reply
> The identification of events and the times at which they occur are specified case by case, as are the statistical recipes. The approach explicitly preserves some latitude of choice, as is appropriate for an experiment exploring new territory. Accepting loose criteria for event identification allows exploration of a variety of categories, while the specification of a rigorous, simple hypothesis test for each event in the formal series assures valid statistics.
I’ve never seen someone so blatantly spell out that they are cherry picking, but also then argue that the cherry picking is good science ;) this is the sort of thing that gives real scientists a bad name.
[+] [-] pope_meat|4 years ago|reply
[+] [-] Workaccount2|4 years ago|reply
[+] [-] Someone|4 years ago|reply
I doubt that is still true for ‘modern’ languages. Let’s google a few.
https://docs.microsoft.com/en-us/dotnet/api/system.random?vi... says “The current implementation of the Random class is based on a modified version of Donald E. Knuth's subtractive random number generator algorithm”. So, that’s a no.
https://rust-random.github.io/book/guide-rngs.html doesn’t appear to mention multiplicative algorithms, either.
https://developer.apple.com/documentation/swift/systemrandom... says “SystemRandomNumberGenerator is automatically seeded, is safe to use in multiple threads, and uses a cryptographically secure algorithm whenever possible.”
[+] [-] sunny--tech|4 years ago|reply
I've edited that section.
[+] [-] SavantIdiot|4 years ago|reply
[1] https://csrc.nist.gov/Projects/Random-Bit-Generation/Documen...
[+] [-] panax|4 years ago|reply
[+] [-] lscharen|4 years ago|reply
http://www.nrbook.com/devroye/
An undergraduate algorithms + statistics class is sufficient to get a lot out of this book, even if it's just exposure to the wide variety of techniques for generating random numbers on a computer.
[+] [-] Borrible|4 years ago|reply
https://www.researchgate.net/publication/283762433_The_Cosmi...
[+] [-] simonblack|4 years ago|reply
Computed:
NOT truly random. Way back in the late 70s, I had a BASIC program that used to output the very same set of 8-digit 'random' numbers every time the program was used.
(Unless you did a special thing the first time, to obtain a random seed to generate a different sequence of pseudo-random numbers. IIRC, it was the number of machine cycles since the last time the floppy disk was accessed.)
Hardware:
Truly random: The machine counts machine cycles with a small maximum number before it overflows and restarts at zero. The machine looks at the time difference between things like key presses, or disk-spins, or something else that varies in time.
No matter how good you are the variation in time between your key-presses is never the same at the very small time-flow level. That number is used as a seed to a pseudo-random generator.
[+] [-] dragontamer|4 years ago|reply
Yes, "0, 3, 6, 9... 4294967295, 2, 5, 8... 4294967294, 1, 4, 7... 4294967293, 0, 3..." is a RNG of sorts, but an example of a really, really bad one that would instantly fail most statistical tests. :-) But conceptually, the Mersenne Twister, LCGRNG, and LSFR all accomplish this. Its a "reordering" of the sequence. That's it.
A "cryptographic random number" just adds a few additional tests to the pool of statistical tests. In particular: differential cryptography gets into the nitty gritty about which bits can predict the results of other bits. You have to assume that the "opponent" is willing to use extraordinary amounts of computing power to detect patterns.
If bit#25 has a 51% correlation with bit#30, you fail cryptographic random numbers. You need to be within 2^128 (at least) worth of security or more. That means a near 50% correlation (maybe 50.00000000001% is fine) between bits and future bits.
For example: the sequence: {AES(0, key), AES(1, key), AES(2, key)... AES(2^128, key), AES(0, key)...} is a cryptographically secure random number generator. The sequence will loop after its 128 bits of state are exhausted. If the "opponent" doesn't know the key, the bitwise correlations are cryptographically sound (thanks to the hard work of the engineers behind AES).
A true random number generator is just a cryptographic number generator applied to a truly random seed. White noise generators are a well known electronic-engineer trick: resistor noise is everywhere but is rather small (but you can build a white-noise generator from Johnson Nyquist noise if you really wanted). More likely, you use shot-noise from a transistor junction, at least at the hobbyist level.
Intel / AMD have true random number generators from some kind of electrical noise generator on every CPU, which feeds into cryptographic random number generators.
There are other sources of noise: radiation is a well known one but I'm not sure if they're practical.
There's a "speed limit" to white noise generators. You only can extract so much entropy from them in a given time (ex: Remember: CPUs operate at 4GHz, or 0.25 nanoseconds per clock tick). Cryptographic random number generators "stretch" the true seed of randomness, while the white-noise generator continues to grab more entropy.
[+] [-] MauranKilom|4 years ago|reply
Technically a PRNG state sequence could also have distinct loops, returning a reordering of only a subset of the sequence for any given seed. For example, `return state += 2;` would return two separate sequences for odd or even seeds. We know that a PRNG with N bits of state must necessarily repeat after 2^N invocations, but it requires further proof that a given PRNG will not repeat at smaller intervals.
> For example: the sequence: {AES(0, key), AES(1, key), AES(2, key)... AES(2^128, key), AES(0, key)...} is a cryptographically secure random number generator. The sequence will loop after its 128 bits of state are exhausted.
In light of the above, is it possible to prove this (through some property of AES)?
(Also, we can't really prove that AES, or any other current algorithm, is cryptographically secure. Tomorrow, a new cryptanalysis technique could be revealed which breaks it.)
[+] [-] SavantIdiot|4 years ago|reply
[+] [-] rrauenza|4 years ago|reply
That's a vague and I'm sure inaccurate description, but it isn't enough for me to google it ... anyone know what I'm referring to?
[+] [-] chipuni|4 years ago|reply
[+] [-] mjreacher|4 years ago|reply
- John von Neumann
[+] [-] hdivider|4 years ago|reply
https://www.idquantique.com/random-number-generation/product...
[+] [-] anotherevan|4 years ago|reply
[+] [-] qq4|4 years ago|reply
[+] [-] dragontamer|4 years ago|reply
Anyone actually interested in the electronics of this should build a white-noise generator out of an Op-Amp + your favorite PN junction in reverse-bias mode (diode, BJT transistor, or whatever).
Shot-noise from reverse-bias'd PN junctions is white noise at a quantum level. You're physically seeing the random electrons move across a junction that wasn't supposed to happen, and then amplifying those electrons up to levels we can detect (well... not our fingers to detect. But a fancy op-amp amplifier + arduino can detect).
https://www.maximintegrated.com/en/design/technical-document...
EDIT: This circuit from Maxim is reverse-breakdown noise from a Zener diode, which is more vigorous than shot-noise, and therefore easier to amplify. Its still white-noise and therefore "Truly random" up to the MHz. The circuit uses a Maxim voltage amplifier (I mean, the article is a big advertisement for how simple the MAX2650 is to use...)
[+] [-] kerblang|4 years ago|reply
[+] [-] comeonseriously|4 years ago|reply
[+] [-] dunefox|4 years ago|reply
[+] [-] actually_a_dog|4 years ago|reply
Without reference material, probably the best I could do would be to read from one of the /dev/*random devices. I'm guessing that whatever the actual intention behind that question is that writing a function that essentially reads from a file doesn't match the expectation of the interviewer. :/
https://en.wikipedia.org/wiki/Linear_congruential_generator
[+] [-] FabHK|4 years ago|reply
[+] [-] jason_s|4 years ago|reply
[+] [-] fnord77|4 years ago|reply
[+] [-] sodality2|4 years ago|reply
Some say RDRAND is a backdoor (and after Dual EC, it would not particularly surprise me). This is why RDRAND doesn't exist in Linux (or, maybe isn't the only source of entropy?): https://www.linux-magazine.com/Online/News/Linus-Says-No-Bac...
Not sure the security implications of using poisoned seeds in addition to truly random seeds in an RNG. Some say that it's okay, because it cannot reduce security, only not increase it. But if the CPU RDRAND instruction is backdoored, couldn't the RNG instructions be intercepted and replaced so that RDRAND is the only seed? But, if your CPU is backdoored, why even bother with anything? etc, etc etc. This discussion could go on for a while.
[+] [-] sunny--tech|4 years ago|reply
[+] [-] unnouinceput|4 years ago|reply
[+] [-] abnry|4 years ago|reply
And also Dilbert: https://dilbert.com/search_results?terms=Random+Number+Gener...