top | item 9941364

A call to PHP's mt_rand generates only odd numbers

212 points| ComputerGuru | 10 years ago |3v4l.org | reply

119 comments

order
[+] Stormcaller|10 years ago|reply
Because max value should be "mt_getrandmax()" instead of "PHP_INT_MAX", it just gets a 32 bit number then scales it up.

see: http://php.net/manual/en/function.mt-rand.php

Under caution:

The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32. This is because if max is greater than the value returned by mt_getrandmax(), the output of the random number generator must be scaled up.

edit: this post went from 5 points to 1, which I don't care about(in ~500 days I posted less than 10 times and I have ~35 points), but who downvotes documentation, seriously? -_-

[+] _asummers|10 years ago|reply
While documented, that is surprising behavior. If it takes in an int, shouldn't it be able to take in PHP_INT_MAX? And shouldn't it yell at you instead of just silently going about its day?
[+] jrgv|10 years ago|reply
There is a caution box in the description, which contains a different warning, and then the warning you have cited is in a second caution box in the "Notes" section, after the changelog and the examples.

It doesn't surprise me that people might not notice the existence of that second warning. I believe that most developers wouldn't scroll down to read the changelog and the example if they think they understood what the function does from its description.

[+] 3JPLW|10 years ago|reply
Why even accept ranges that span more than 2^32? That seems like an easy solution to a broken function.

Also fun:

    echo "<?php echo mt_rand(-mt_getrandmax(), mt_getrandmax());" | php
is always even on my PHP 5.5.20.
[+] eonw|10 years ago|reply
no surprised about the down votes, i've seen a lot of logical explanations and correct responses get them. its almost like they would rather read drama then thoughtful answers.
[+] smegel|10 years ago|reply
> Because max value should be "mt_getrandmax()" instead of "PHP_INT_MAX", it just gets a 32 bit number then scales it up.

How does that answer anything?

[+] _yy|10 years ago|reply
Quoting a Reddit comment (https://www.reddit.com/r/lolphp/comments/3eaw98/mt_rand1_php...):

The problem is way worse than you think. Check out what this looks like when printed in hexadecimal: http://3v4l.org/XVTgS

Basically, what is going on is that PHP_INT_MAX is 2^63 - 1. mt_getrandmax() is 2^31 - 1. The way mt_rand() makes a random number when the limit is too large is that it makes a random number in the range [0,2^(31)), then it scales it to be a number in the range [0,MAX-MIN), and finally adds MIN.

So in your case, it scales everything by 2^32 and adds 1. Which is why the numbers are* extremely non-random. [See my other comment in this thread for a more detailed explanation and some more test scripts that prove this is what is happening.](https://www.reddit.com/r/lolphp/comments/3eaw98/mt_rand1_php...

[+] sarciszewski|10 years ago|reply
And that's another reason why it's a good thing that PHP 7 (coming out soon!) has a new function called random_int() which provides an unbiased distribution of integers powered by a CSPRNG (yes, it uses urandom, in case anyone asks).

My employer is leading the effort to expose a compatible interface in PHP 5 applications so developers can add one line to their composer.json file and start writing code for PHP 7. It's MIT licensed and should nearing its 1.0.0 release soon.

https://github.com/paragonie/random_compat

[+] rurban|10 years ago|reply
Using /dev/urandom is not a good thing. It's only needed to get secure random numbers (CSPRGs) very slowly. You'll drain it so that the apps which really need it will be out of seeds. To get random numbers fast, you need to use the good bits of a PRG based on an LCG.

Everybody should know by now that the mersenne twister is not only bad, but also slow.

Everybody should know by now that the first bits are good, and the last bits horrible. That's why you should not use modulo %, but rshift or better use Melissa E. O'Neill's PCG, which uses the first good bits to improve the latter worse bits. http://www.pcg-random.org/

[+] beefhash|10 years ago|reply
Using a loop of that length (for ($i=0;$i<10000;$i++)) on a site allowing people to execute code, and then linking it to HN effectively amounts to a do-it-yourself DDoS. I don't think I wanna be the host of that site right now.
[+] laumars|10 years ago|reply
The site itself picked up on this aswell. The following message appeared for me:

    "Abusive script
    This script was stopped while abusing our resources"
edit: Too slow: 504 Gateway Time-out
[+] Buge|10 years ago|reply
Shouldn't they cache the result?
[+] the-dude|10 years ago|reply
Which we could call 'suiciDDOS'
[+] _asummers|10 years ago|reply
It's also being linked on Reddit for a double whammy DDoS.
[+] deanclatworthy|10 years ago|reply
... and it's down :) I guess they get this all the time.
[+] Xlab|10 years ago|reply
They don't even caution about it.
[+] dantillberg|10 years ago|reply
I believe it's also generally an anti-pattern to do things like "num = rand() % max" or "num = rand() & bit_mask" where rand() returns an integer from a pseudo-random number generator, right?

PHP may not do a very good job at ensuring an even distribution throughout the space of possible integers, but for PRNGs in general (especially the quick & dirty ones), the worst place to grab bits from is the least-significant bits.

(my source is that I hung out with a copy of Numerical Recipes in college; Numerical Recipes has a nice chapter for learning about PRNGs, along with example code for a number of implementations)

[+] foobar2020|10 years ago|reply
It is OK to do (rand() % modulus) when (modulus + 1) divides (MAX_RAND + 1). Otherwise you will end up with non-uniform distribution.

But yes, in general the rand-modulo pair is an anti-pattern.

[+] MrDosu|10 years ago|reply
This is why you read the documentation. Don't deduce what a method does purely on its name and your common sense, you are probably wrong...
[+] TelmoMenezes|10 years ago|reply
Yes. In fact I propose that giving methods pronounceable names is an anti-pattern. Method names should be random strings of characters. Then you don't fall into the temptation of not reading the documentation and assuming things. In fact, make them long enough so that you can't memorize them.

someObj.dbrdCfj34uW31U289u(x)

This avoids a lot of frustration. Imagine if this was called someObj.sqrt(x). People could jump to the conclusion that this takes the square root of a value, when in reality it only does so on weekdays. On weekends it returns the CPU temperature.

[+] pogden|10 years ago|reply
At least in PHP. Unfortunately in my experience the PHP documentation is not frequently read by PHP developers or even PHP internals developers. It seems to exist mainly for PHP hatebloggers, for whom it's a gold mine.

This illustrates some problems with the PHP way of doing things. For some reason the PHP internals community prefers to have this function continue to spit out garbage when given invalid inputs instead of throwing an error, especially if spitting out garbage is what was done in the past. The answer is always, "Don't do that. RTFM." but most developers don't read the documentation for each and every function that they use, but rather rely on the name of and common sense until there's a error. If there's no error, the documentation never gets read.

[+] noir_lord|10 years ago|reply
I use PHP a lot (primary product is written in it) and I actually mostly like using it but you can get really burnt applying the "principle of least surprise" to the PHP standard library.

I really wish there was more support behind the idea of creating a new standard library and gradually deprecating the old one, it's been tried but never gets any traction.

Meanwhile frameworks and libraries continue to add their own (often very good) helper libraries to smooth the edges.

[+] deckiedan|10 years ago|reply
It's because there simply are more odd numbers than even numbers...

(j/k!)

[+] raverbashing|10 years ago|reply
I think if you consider Z rather than N the opposite is true

Proof:

For ever number k there exists -k which has the same parity as k (that is, if k is even, -k is even as well)

For every k there is (k + 1) with opposite parity (same thing with -k and -(k + 1)

Now, if we count the numbers, from 1 to infinite, the number of even and odd numbers is the same

from 1 to -infinite the number of even and odd numbers is the same as well

And you get an extra zero, an even number

[+] unknown|10 years ago|reply

[deleted]

[+] Retr0spectrum|10 years ago|reply
The bug works for me on PHP5. Are you on a 32-bit machine? Your values would suggest that you are. See Stormcaller's comment for an explanation.
[+] 3JPLW|10 years ago|reply
It is a PHP bug. You're on a 32-bit system. The bug only occurs on 64 bit systems.
[+] mercuti0|10 years ago|reply
You on a 32-bit system?