The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32. This is because if max is greater than the value returned by mt_getrandmax(), the output of the random number generator must be scaled up.
edit: this post went from 5 points to 1, which I don't care about(in ~500 days I posted less than 10 times and I have ~35 points), but who downvotes documentation, seriously? -_-
While documented, that is surprising behavior. If it takes in an int, shouldn't it be able to take in PHP_INT_MAX? And shouldn't it yell at you instead of just silently going about its day?
There is a caution box in the description, which contains a different warning, and then the warning you have cited is in a second caution box in the "Notes" section, after the changelog and the examples.
It doesn't surprise me that people might not notice the existence of that second warning. I believe that most developers wouldn't scroll down to read the changelog and the example if they think they understood what the function does from its description.
no surprised about the down votes, i've seen a lot of logical explanations and correct responses get them. its almost like they would rather read drama then thoughtful answers.
The problem is way worse than you think. Check out what this looks like when printed in hexadecimal: http://3v4l.org/XVTgS
Basically, what is going on is that PHP_INT_MAX is 2^63 - 1. mt_getrandmax() is 2^31 - 1. The way mt_rand() makes a random number when the limit is too large is that it makes a random number in the range [0,2^(31)), then it scales it to be a number in the range [0,MAX-MIN), and finally adds MIN.
So in your case, it scales everything by 2^32 and adds 1. Which is why the numbers are* extremely non-random. [See my other comment in this thread for a more detailed explanation and some more test scripts that prove this is what is happening.](https://www.reddit.com/r/lolphp/comments/3eaw98/mt_rand1_php...
And that's another reason why it's a good thing that PHP 7 (coming out soon!) has a new function called random_int() which provides an unbiased distribution of integers powered by a CSPRNG (yes, it uses urandom, in case anyone asks).
My employer is leading the effort to expose a compatible interface in PHP 5 applications so developers can add one line to their composer.json file and start writing code for PHP 7. It's MIT licensed and should nearing its 1.0.0 release soon.
Using /dev/urandom is not a good thing.
It's only needed to get secure random numbers (CSPRGs) very slowly. You'll drain it so that the apps which really need it will be out of seeds.
To get random numbers fast, you need to use the good bits of a PRG based on an LCG.
Everybody should know by now that the mersenne twister is not only bad, but also slow.
Everybody should know by now that the first bits are good, and the last bits horrible. That's why you should not use modulo %, but rshift or better use Melissa E. O'Neill's PCG, which uses the first good bits to improve the latter worse bits.
http://www.pcg-random.org/
Using a loop of that length (for ($i=0;$i<10000;$i++)) on a site allowing people to execute code, and then linking it to HN effectively amounts to a do-it-yourself DDoS. I don't think I wanna be the host of that site right now.
I believe it's also generally an anti-pattern to do things like "num = rand() % max" or "num = rand() & bit_mask" where rand() returns an integer from a pseudo-random number generator, right?
PHP may not do a very good job at ensuring an even distribution throughout the space of possible integers, but for PRNGs in general (especially the quick & dirty ones), the worst place to grab bits from is the least-significant bits.
(my source is that I hung out with a copy of Numerical Recipes in college; Numerical Recipes has a nice chapter for learning about PRNGs, along with example code for a number of implementations)
Yes. In fact I propose that giving methods pronounceable names is an anti-pattern. Method names should be random strings of characters. Then you don't fall into the temptation of not reading the documentation and assuming things. In fact, make them long enough so that you can't memorize them.
someObj.dbrdCfj34uW31U289u(x)
This avoids a lot of frustration. Imagine if this was called someObj.sqrt(x). People could jump to the conclusion that this takes the square root of a value, when in reality it only does so on weekdays. On weekends it returns the CPU temperature.
At least in PHP. Unfortunately in my experience the PHP documentation is not frequently read by PHP developers or even PHP internals developers. It seems to exist mainly for PHP hatebloggers, for whom it's a gold mine.
This illustrates some problems with the PHP way of doing things. For some reason the PHP internals community prefers to have this function continue to spit out garbage when given invalid inputs instead of throwing an error, especially if spitting out garbage is what was done in the past. The answer is always, "Don't do that. RTFM." but most developers don't read the documentation for each and every function that they use, but rather rely on the name of and common sense until there's a error. If there's no error, the documentation never gets read.
I use PHP a lot (primary product is written in it) and I actually mostly like using it but you can get really burnt applying the "principle of least surprise" to the PHP standard library.
I really wish there was more support behind the idea of creating a new standard library and gradually deprecating the old one, it's been tried but never gets any traction.
Meanwhile frameworks and libraries continue to add their own (often very good) helper libraries to smooth the edges.
[+] [-] Stormcaller|10 years ago|reply
see: http://php.net/manual/en/function.mt-rand.php
Under caution:
The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32. This is because if max is greater than the value returned by mt_getrandmax(), the output of the random number generator must be scaled up.
edit: this post went from 5 points to 1, which I don't care about(in ~500 days I posted less than 10 times and I have ~35 points), but who downvotes documentation, seriously? -_-
[+] [-] _asummers|10 years ago|reply
[+] [-] jrgv|10 years ago|reply
It doesn't surprise me that people might not notice the existence of that second warning. I believe that most developers wouldn't scroll down to read the changelog and the example if they think they understood what the function does from its description.
[+] [-] 3JPLW|10 years ago|reply
Also fun:
is always even on my PHP 5.5.20.[+] [-] eonw|10 years ago|reply
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] smegel|10 years ago|reply
How does that answer anything?
[+] [-] _yy|10 years ago|reply
The problem is way worse than you think. Check out what this looks like when printed in hexadecimal: http://3v4l.org/XVTgS
Basically, what is going on is that PHP_INT_MAX is 2^63 - 1. mt_getrandmax() is 2^31 - 1. The way mt_rand() makes a random number when the limit is too large is that it makes a random number in the range [0,2^(31)), then it scales it to be a number in the range [0,MAX-MIN), and finally adds MIN.
So in your case, it scales everything by 2^32 and adds 1. Which is why the numbers are* extremely non-random. [See my other comment in this thread for a more detailed explanation and some more test scripts that prove this is what is happening.](https://www.reddit.com/r/lolphp/comments/3eaw98/mt_rand1_php...
[+] [-] sarciszewski|10 years ago|reply
My employer is leading the effort to expose a compatible interface in PHP 5 applications so developers can add one line to their composer.json file and start writing code for PHP 7. It's MIT licensed and should nearing its 1.0.0 release soon.
https://github.com/paragonie/random_compat
[+] [-] rurban|10 years ago|reply
Everybody should know by now that the mersenne twister is not only bad, but also slow.
Everybody should know by now that the first bits are good, and the last bits horrible. That's why you should not use modulo %, but rshift or better use Melissa E. O'Neill's PCG, which uses the first good bits to improve the latter worse bits. http://www.pcg-random.org/
[+] [-] beefhash|10 years ago|reply
[+] [-] laumars|10 years ago|reply
[+] [-] Buge|10 years ago|reply
[+] [-] the-dude|10 years ago|reply
[+] [-] _asummers|10 years ago|reply
[+] [-] deanclatworthy|10 years ago|reply
[+] [-] Xlab|10 years ago|reply
[+] [-] dantillberg|10 years ago|reply
PHP may not do a very good job at ensuring an even distribution throughout the space of possible integers, but for PRNGs in general (especially the quick & dirty ones), the worst place to grab bits from is the least-significant bits.
(my source is that I hung out with a copy of Numerical Recipes in college; Numerical Recipes has a nice chapter for learning about PRNGs, along with example code for a number of implementations)
[+] [-] foobar2020|10 years ago|reply
But yes, in general the rand-modulo pair is an anti-pattern.
[+] [-] sarciszewski|10 years ago|reply
https://stackoverflow.com/a/31374501/2224584
[+] [-] MrDosu|10 years ago|reply
[+] [-] TelmoMenezes|10 years ago|reply
someObj.dbrdCfj34uW31U289u(x)
This avoids a lot of frustration. Imagine if this was called someObj.sqrt(x). People could jump to the conclusion that this takes the square root of a value, when in reality it only does so on weekdays. On weekends it returns the CPU temperature.
[+] [-] pogden|10 years ago|reply
This illustrates some problems with the PHP way of doing things. For some reason the PHP internals community prefers to have this function continue to spit out garbage when given invalid inputs instead of throwing an error, especially if spitting out garbage is what was done in the past. The answer is always, "Don't do that. RTFM." but most developers don't read the documentation for each and every function that they use, but rather rely on the name of and common sense until there's a error. If there's no error, the documentation never gets read.
[+] [-] noir_lord|10 years ago|reply
I really wish there was more support behind the idea of creating a new standard library and gradually deprecating the old one, it's been tried but never gets any traction.
Meanwhile frameworks and libraries continue to add their own (often very good) helper libraries to smooth the edges.
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] deckiedan|10 years ago|reply
(j/k!)
[+] [-] raverbashing|10 years ago|reply
Proof:
For ever number k there exists -k which has the same parity as k (that is, if k is even, -k is even as well)
For every k there is (k + 1) with opposite parity (same thing with -k and -(k + 1)
Now, if we count the numbers, from 1 to infinite, the number of even and odd numbers is the same
from 1 to -infinite the number of even and odd numbers is the same as well
And you get an extra zero, an even number
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] T3RMINATED|10 years ago|reply
[deleted]
[+] [-] T3RMINATED|10 years ago|reply
[deleted]
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] Retr0spectrum|10 years ago|reply
[+] [-] 3JPLW|10 years ago|reply
[+] [-] mercuti0|10 years ago|reply
[+] [-] InfiniteEntropy|10 years ago|reply