top | item 1803284

Understanding "randomness"

60 points| giu | 15 years ago |stackoverflow.com | reply

21 comments

order
[+] drblast|15 years ago|reply
The top answer is an excellent demonstration of the central limit theorem and why the question's assumption that rand() and rand() * rand() are "equally random" is wrong.

The central limit theorem is one of the most initially surprising and fascinating things in all of math. Please support the central limit theorem by learning about it on Wikipedia, or wherever fine theorems are discussed.

Edit to add: If you haven't learned about this enough to know that implementing systems that depend on randomness, like cryptographic systems, is really, really difficult and error-prone, please just use a standard library instead.

It's the most counter-intuitive thing in the world that rand() * rand() is less secure than rand(); shouldn't it be twice as unpredictable? But it's not. Cryptography and related fields are rife with counter-intuitive gotchas.

[+] hvs|15 years ago|reply
My favorite quote from that page:

Don't take Statistics and Intuition to the same party ....

[+] onedognight|15 years ago|reply
Here's some intuition about why rand() * rand() is different than rand().

Let's say you are doing rand(2) which produces either 0 or 1, each equally often, then rand(2) * rand(2) will produce 0 much more often than 1.

    0 * 0 = 0
    0 * 1 = 0
    1 * 0 = 0
    1 * 1 = 1
Addition has a similar problem.

    0 + 0 = 0
    0 + 1 = 1
    1 + 0 = 1
    1 + 1 = 2
If you use the right range and wrap the result. e.g. if you do mod(rand(2^n) + rand(2^n), 2^n) then addition will not affect the distribution of your numbers.
[+] pjscott|15 years ago|reply
I remember when I first learned about the central limit theorem, having this sudden realization:

So that's where all those normal distributions come from!

In the statistics classes that most people take, they just dump this weird equation on you, and say that the normal distribution often arises in the real world. But they don't explain why, or in what kinds of situations you're likely to see things being normally distributed. It's like suddenly finding a walrus in your living room: everybody assures you that it's supposed to be there, but where the hell did it come from?

The central limit theorem de-mystifies the normal distribution and makes the world easier to understand.

[+] cdavidcash|15 years ago|reply
It's the most counter-intuitive thing in the world that rand() rand() is less secure than rand(); shouldn't it be twice as unpredictable?*

Under some reasonable assumptions, rand() + rand() mod 1 is twice as random! If both calls return independent samples, and at least one of the calls to rand() returns a uniform sample in [0,1], then the sum will be uniform on [0,1].

So if you assume you get at least one good shot and that the other is not specifically correlated to it, you'll be improving your randomness.

[+] samd|15 years ago|reply
Anyone interested in understanding randomness should endeavor to understand what probability actually is. As a Bayesian I think that probability is not a property of any object but rather a subjective measure of your own confidence, your own knowledge. To say something is random is just to say that you lack enough information to predict any one outcome over another. But everything could be predicted perfectly (barring quantum events) with enough knowledge. If I knew that a coin was weighted to favor heads my degree of belief in the outcome of the toss would become stronger. The probability for me would become, say, 75% heads, while for the observer without such knowledge it would be 50%.

Here's a good article on probability theory for those interested: http://plato.stanford.edu/entries/probability-interpret/

[+] iskander|15 years ago|reply
>@Thilo No, it doesn't... ? If a random variable is uniformly distributed in the range (0,1), and you sample the variable n times, and take the sum, it will just be uniformly distributed in the range (0,n). – user359996 12 hours ago

It's so easy to sound certain and be wildly wrong.

[+] danger|15 years ago|reply
Why has nobody in this thread or the stack overflow thread linked to the Wikipedia page on entropy?

http://en.wikipedia.org/wiki/Entropy_(information_theory)

[+] cdavidcash|15 years ago|reply
This is because Shannon entropy is basically useless when it comes to proving anything about randomized algorithms or cryptography.
[+] GavinB|15 years ago|reply
More interesting to grasp is that even rand() * rand() / rand() is less evenly distributed than rand().

It does seem like it should be possible to remap the numbers onto an even distribution if you knew how uneven to expect it to be. That would probably lead to some loss of fidelity at the limits of accuracy of the variable, however.

[+] chrisaycock|15 years ago|reply
I like the comments about dice. With one die, an outcome of 4 or 2 is equally likely. With two dice, an outcome of 4 is three times more likely than 2.

So while one and two dice games are both random (as in unpredictable), only the one-die game has uniformly distributed outcomes.