(no title)
BigTTYGothGF | 11 days ago
The first sentence is obviously true, but I'm going to need to see some evidence for "enormous bias" and "total nonsense". Let's leave aside lousy/little/badly-seeded PRNGs. Are there any non-cryptographic examples in which a well-designed PRNG with 256 bits of well-seeded random state produces results different enough from a TRNG to be visible to a user?
lilyball|11 days ago
I imagine you could change the p-value test to randomly sample assignments generated via the exact same process that was used to generate the assignment used by the experiment, and as you run more and more iterations of this the calculated p-value should converge to the correct value, but then the question becomes is the p-value calculated this way the same as the p-value you'd get if you actually went ahead and used equiprobable assignment to begin with?
Ultimately, this all comes down to the fact that it's not hard to use true randomness for the whole thing, and true randomness produces statistically valid results, if you use true randomness for assignment then you can't screw up the p-value test, and so there's no reason at all to even consider how to safely use a PRNG here, all that does is open the door to messing up.
gzread|11 days ago
Of course a PRNG generates the same sequence every time with the same seed, but that's true of every RNG, even a TRNG where the "seed" is your current space and time coordinates. To get more results from the distribution you have to use more seeds. You can't just run an RNG once, get some value, and then declare the RNG is biased towards the value you got. That's not a useful definition of bias.
freehorse|11 days ago
I assume that as long as p-values are concerned, the issue raised could very well be measured with simulations and permutations. I really doubt though that the distribution of p-values from pseudorandom assignments with gaps would not converge very fast to the "real" distribution you would get from all permuations due to some version of a law of large numbers. A lot of resampling/permutation techniques work by permuting a negligible fraction of all possible permutations, and the distribution of the statistics extracted converges pretty fast. As long as the way the gaps are formed are independent of the effects measured, it sounds implausible that the p-values one gets are problematic because of them.
hansvm|10 days ago
BigTTYGothGF|10 days ago
Suppose I'm doing something where I need N(0,1) random variates. I sample from U(0,1) being sure to use a TRNG, do my transformations, and everything's good, right? But my sample isn't U(0,1), I'm only able to get float64s (or float32s), and my transform isn't N(0,1) as there's going to be some value x above which P(z>x)=0. The theory behind what I'm trying to do assumes N(0,1) and so all my p-values are invalid.
Nobody cares about that because we know that our methods are robust to this kind of discretization. Similarly I think nobody (most people) should care (too much) about having "only" 256 bits of entropy in their PRNG because our methods appear to be robust to that.
kqr|10 days ago
Bakker and Wicherts (2011) would like to disagree! Apparently 15 % screw up the calculation of the p-value.
labcomputer|11 days ago
We can create a balanced partitioning of the 300 turkeys with a 300 bit random number having an equal number of 1's and 0's.
Now suppose I randomly pick 300 bit number, still with equal 0's and 1's, but this time the first 20 bits are always 0's and the last 20 bits are always 1's. In this scenario, only the middle 260 bits (turkeys) are randomly assigned, and the remaining 40 are deterministic.
We can quibble over what constitutes an "enormous" bias, but the scenario above feels like an inadequate experiment design to me.
As it happens, log2(260 choose 130) ~= 256.
> Are there any non-cryptographic examples in which a well-designed PRNG with 256 bits of well-seeded random state produces results different enough from a TRNG to be visible to a user?
One example that comes to mind is shuffling a deck of playing cards. You need approximately 225 bits of entropy to ensure that every possible 52 card ordering can be represented. Suppose you wanted to simulate a game of blackjack with more than one deck or some other card game with more than 58 cards. 256 bits is not enough there.
fc417fc802|11 days ago
For example. Suppose I have 2^128 unique playing cards. I randomly select 2^64 of them and place them in a deck. Someone proceeds to draw 2^8 cards from that deck, replacing and reshuffling between each draw. Does it really matter that those draws weren't technically independent with respect to the larger set? In a sense they are independent so long as you view what happened as a single instance of a procedure that has multiple phases as opposed to multiple independent instances. And in practice with a state space so much larger than the sample set the theoretical aspect simply doesn't matter one way or the other.
We can take this even farther. Don't replace and reshuffle after each card is drawn. Since we are only drawing 2^8 of 2^64 total cards this lack of independence won't actually matter in practice. You would need to replicate the experiment a truly absurd number of times in order to notice the issue.
BigTTYGothGF|10 days ago
Yeah, but the question is: who cares?
Suppose you and I are both simulating card shuffling. We have the exact same setup, and use a 256-bit well-behaved PRNG for randomness. We both re-seed every game from a TRNG. The difference is that you use all 256 bits for your seed, while I use just 128 and zero-pad the rest. The set of all shuffles that can be generated by your method is obviously much larger than the set that can be generated by mine.
But again: who cares? What observable effect could there possibly be for anybody to take action if they know they're in a 128-bit world vs a 256-bit one?
The analogy obviously doesn't generalize downwards, I'd be singing a different tune if it was, say, 32 bits instead of 128.
DevelopingElk|11 days ago
topaz0|10 days ago
Also worth noting that the situations where this matters are usually where your effect size is fairly small compared to the unexplained variation, so a few percent error in your p-value can make a difference.
tripletao|10 days ago
Your numbers don't make sense. Your number of assignments is way fewer than 2^256, so the problem the author is (mistakenly) concerned about doesn't arise--no sane method would result in any measurable deviation from equiprobable, certainly not "twice as likely".
With a larger number of turkeys and thus assignments, the author is correct that some assignments must be impossible by a counting argument. They are incorrect that it matters--as long as the process of winnowing our set to 2^256 candidates isn't measurably biased (i.e., correlated with turkey weight ex television effects), it changes nothing. There is no difference between discarding a possible assignment because the CSPRNG algorithm choice excludes it (as we do for all but 2^256) and discarding it because the seed excludes it (as we do for all but one), as long as both processes are unbiased.
unknown|11 days ago
[deleted]
jmalicki|11 days ago
If you can't generate all possible assignments, you care about second and third order properties etc. of the sequence.
coppsilgold|11 days ago
mananaysiempre|11 days ago
Huh? If you can chew through however many gigabytes of the supposed CSPRNG’s output, do some statistics, and with a non-negligible probability tell if the bytes in fact came from the CSPRNG in question or an actual iid random source, then you’ve got a distinguisher and the CSPRNG is broken.