> But, as Dr Aimone’s colleague Darby Smith notes, the real world that computer modellers are trying to model does not work like this. For example, the temperature in London in December may vary between -7°C and 17°C, but is most likely to be in the range 3°C to 8°C. [...] Distorting uniform distributions of random numbers to take account of these realities is tedious and unsatisfactory. As Dr Smith observes, it would be more efficient if the random numbers used corresponded to the natural distribution in the first place.
I have a hard time understanding how this is a problem. Using uniform random numbers to choose the x-axis value on a graph of an integrated probability distribution function and mapping it to the y-axis is pretty garsh darn simple and normal. This is the crux of monte carlo methods and is stupidly fast these days.
Though I agree we have good solutions to this, I also think it's a more nuanced problem than it seems at first glance. See for example this stack overflow post: https://stackoverflow.com/questions/75677/converting-a-unifo.... To convert uniform to a Gaussian there are at least four different competing methods, all with subtle tradeoffs. Besides that there are certainly other continuous distributions we might be interested in transforming to, which are useful but not as nice as a Gaussian.
For what it's worth, I once struggled to implement random walks on custom hardware because its ISA lacked trig support and I couldn't generate N(0,1)s quickly enough. For that application, tens of fp32 maccs to transform from uniform data was a non-starter.
Producing large samples of uniform pseudo-random numbers in high dimensions is very tricky. If COINFLIPS can solve that problem then that would be very useful. That said, I don't know if there are many computational workloads where the basic random number generation step is the performance bottleneck.
Random scalars definitely don't seem like somethign that needs radical technical solutions, no. I wonder if the tricky part is in choosing random vectors?
Just doing simple things like choosing a uniformly distributed random point in a disk or on the surface of a sphere are awkward enough to do efficiently; choosing a bunch of random points drawn from the London temperature/pressure/rainfall distribution for December is much worse. I wonder if that might be part of the complexity that is being elided to bring this article to a popular-science-reader level?
Yes, you can use an inverse CDF to implement a fairly efficient generator for pretty much any probability distribution. (I'm sure there's some pathological, and uncommon, counterexample.) It's just a small/quick lookup table, and you can interpolate if you need extra resolution.
Greying or mixing or whatever of pseudorandom or actual "trng" like static or radiation discharges is pretty well known. I.e. take some source of randomness for 10000 bits, xor with each other, and use that to seed a cryptographically secure generator, topping up the seed as more bits become available, is pretty much the gold standard for "I need 50gbit bitstream of 'random'"
The other part seems to be the binning/bucketing/curve fitting, or "massaging" the output of that to only gives numbers in the range under scrutiny, which sounds like an implementation detail to me, rather than something that you can configure at ... Runtime? Compile time?
I'm sure there are real HRNG (or sensors for TRNG) that can approach ludicrous bitrates, I've never had the luxury of one, and I'm not entirely sure I could be convinced the output was any "better" than a vetted arrangement from my second paragraph.
All of this is to say: okay but we already do that and massaging randomness to give us results we want is what all this modeling stuff ... is, so other than some off in the weeds magnetic film manipulation, I'm assuming that's modelled, too. It's just models all the way down?
Even tought the article is about true random number, pseudo random is more valuable for scientific simulations. In those simulations you don't typically want true randomness, you want a generator that pass a suite of randomness tests while having the ability to repeat that stream of numbers to replicate your experiment.
Sometimes, you even discard numbers so different simulation runs with different parameters consume the same amount of numbers. A simplistic example:
if rand()<.5 :
doSomethingWithRand(rand())
else:
doSomethingWithoutRand()
rand()
> xor with each other, and use that to seed a cryptographically secure generator
Do not try to cook your moderate entropy data source. You can end up cooking what little entropy it had out of it.
Many CSPRNGs allow seeds of arbitrary size. Just push all of your sketchy data into it, and let the hash function handle scavenging the entropy for you.
This is on of those articles where they have someone who doesn't really know anything about math or computers interview somebody technical in the field, I think. So most of what the journalist gets, unfortunately, is that PRNGs are a thing but not perfect.
I'm pretty sure the project is looking at physical devices to add to the entropy pool, and they talked about two examples (which do seem pretty neat).
> One relies on the patterns magnetic films make when disturbed, the other on how electrons travel through the barrier of a quantum-tunnelling diode. Both of these things are truly random.
The use of the Von Neumann quote, "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin” is pretty annoying, given that he said it in defense of his PRNG. Sort of a tongue-in-cheek "yes this is fundamentally wrong (sinful) but I'm going to do it" sort of thing.
> Finding random phenomena in nature that can be transformed into computer bits is not easy...Otherwise, specialist, expensive hardware needs to be used to do things such as measuring heat flux through a silicon chip.
What happened to the good old days of taking a picture of a lava lamp and using that as your random number until you take another picture?
Quoting von Neumann about software rngs is ignoring most of a century of development in the field. There are a plethora of high quality CPRNGs that are almost certainly totally unbreakable if you mix in a few bits of hardware randomness per megabyte of random numbers.
Also this was in a paper where he was working on a PRNG, and later he says "If the [pseudorandomly generated] digits work well on one problem, they seem usually to be successful with others of the same type." So this is more of a "Yeah this is technically wrong but I did it and it worked" sort of thing.
Tangentially related, I had a fun time hooking the audio jack of my Geiger counter up to a digital input pin of an esp8266 the other day to generate random numbers from radioactive decay [1]. Not the fastest way to make random numbers. But fun nonetheless.
PRNGs and CPUs are stupidly fast these days. I'm not really convinced that the problem is abundancy of pseudo-random bits, not at least for simulations. A well written PRNG should execute at the same speed as a memset in fact (at least, for the creation of PRNG-bits before processing)
Massaging the bits into proper form: using modulus operator, requires like 80-cycles (aka: a division) on 5-year old CPUs and maybe 20-cycles on a very modern core (with the past 3 years, as Intel seems to have put a lot of work optimizing division). So the bulk of the time is on this massaging process actually.
--------
Even if the bulk of the CPU time is spent converting the uniformly random bits into usable integers (division / modulus is really slow!!), or usable floats / normal distributions or whatever... I still expect the typical core to handle 100-million pseudo random numbers per second... __per core__.
Seeding your PRNG with a true random seed every now and then will ensure large-scale randomness.
I think cryptographers might need something higher quality, but key-generation is quite uncommon, so you wouldn't need to be doing millions-and-millions of keygens per second, would you?
Simultaions need lots of random numbers, but those random numbers don't need to have entropy guarantees, but instead have weirder requirements (not only speed of generation, but also the ability to rerun your simulation... you actually want deterministic random-number-generators for simulations to verify your results). So in these cases, PRNGs are not only needed, but superior to a true RNG.
There's also the weird case of quasi-random, a term I learned from the graphics community. You want your rays to be random-ish, but actually "more uniform" than actually random. Its a more pleasant randomish pattern to look at, leading to pleasant and artistic images (https://en.wikipedia.org/wiki/Halton_sequence). Quasi-random raytracing is a so called "Biased" generator, because the physical light models are probably closer to true random. However, quasi-random sequences lead to far less noise in far less time / samples, so they're more useful in practice.
--------
Ultimately, our x86 CPUs today have "true" RNGs built into the RDSEED instruction, pulling entropy from some kind of physical circuit that's highly sensitive to temperature conditions (thermal quantum noise generators inside of circuits. After all, heat is quantumly random as your electrons jump between states in the very small scale).
> Massaging the bits into proper form: using modulus operator, requires like 80-cycles (aka: a division)
It's not uncommon to instead treat the RNG data as a floating point number between 0 and 1 and multiply it by the modulus (adding or subtracting 1 as appropriate depending on whether you want, say, 0-20 or 1-20 as the output). So that's a float multiplication, an addition, and a conversion to integer. Still probably cheaper than 80 cycles.
But we're also burning 32bits of data to get 4.4 bits of output. When randomness was more expensive than CPU cycles they would throw a Linear Congruence Generator on top, which is a function can transform an input and achieve similar levels of entropy in the output. It might take 5 bits or 8 bits of data instead of 32, at least quadrupling the utility of your source. But that's mostly for historical purposes. As you say, the ratio of CPU to random decisions has shifted many orders of magnitude in the last 20 years.
I own several electron tunneling based RNGs. I will say they are not all the same, there are differences in security, and other esoteric things that HN will probably downvote me for for lack of being in the know.
I will say that random numbers as a service is a viable business opportunity. And I have heard that micro QRNG chips are being put into new phones.
I'd say the real issue here is the quality of the article. It failed to adequately define the problem being solved by this group. I'd expect better from the Economist.
[+] [-] neonate|4 years ago|reply
[+] [-] acidburnNSA|4 years ago|reply
I have a hard time understanding how this is a problem. Using uniform random numbers to choose the x-axis value on a graph of an integrated probability distribution function and mapping it to the y-axis is pretty garsh darn simple and normal. This is the crux of monte carlo methods and is stupidly fast these days.
What are they talking about?
[+] [-] sgillen|4 years ago|reply
[+] [-] macrolocal|4 years ago|reply
[+] [-] riskneutral|4 years ago|reply
[+] [-] jameshart|4 years ago|reply
Just doing simple things like choosing a uniformly distributed random point in a disk or on the surface of a sphere are awkward enough to do efficiently; choosing a bunch of random points drawn from the London temperature/pressure/rainfall distribution for December is much worse. I wonder if that might be part of the complexity that is being elided to bring this article to a popular-science-reader level?
[+] [-] xscott|4 years ago|reply
[+] [-] genewitch|4 years ago|reply
Greying or mixing or whatever of pseudorandom or actual "trng" like static or radiation discharges is pretty well known. I.e. take some source of randomness for 10000 bits, xor with each other, and use that to seed a cryptographically secure generator, topping up the seed as more bits become available, is pretty much the gold standard for "I need 50gbit bitstream of 'random'"
The other part seems to be the binning/bucketing/curve fitting, or "massaging" the output of that to only gives numbers in the range under scrutiny, which sounds like an implementation detail to me, rather than something that you can configure at ... Runtime? Compile time?
I'm sure there are real HRNG (or sensors for TRNG) that can approach ludicrous bitrates, I've never had the luxury of one, and I'm not entirely sure I could be convinced the output was any "better" than a vetted arrangement from my second paragraph.
All of this is to say: okay but we already do that and massaging randomness to give us results we want is what all this modeling stuff ... is, so other than some off in the weeds magnetic film manipulation, I'm assuming that's modelled, too. It's just models all the way down?
[+] [-] nick__m|4 years ago|reply
Sometimes, you even discard numbers so different simulation runs with different parameters consume the same amount of numbers. A simplistic example:
[+] [-] hinkley|4 years ago|reply
Do not try to cook your moderate entropy data source. You can end up cooking what little entropy it had out of it.
Many CSPRNGs allow seeds of arbitrary size. Just push all of your sketchy data into it, and let the hash function handle scavenging the entropy for you.
[+] [-] johnisgood|4 years ago|reply
Completely off-topic: do you live alone in the forest, or with others?
[+] [-] bee_rider|4 years ago|reply
I'm pretty sure the project is looking at physical devices to add to the entropy pool, and they talked about two examples (which do seem pretty neat).
> One relies on the patterns magnetic films make when disturbed, the other on how electrons travel through the barrier of a quantum-tunnelling diode. Both of these things are truly random.
The use of the Von Neumann quote, "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin” is pretty annoying, given that he said it in defense of his PRNG. Sort of a tongue-in-cheek "yes this is fundamentally wrong (sinful) but I'm going to do it" sort of thing.
[+] [-] jedberg|4 years ago|reply
What happened to the good old days of taking a picture of a lava lamp and using that as your random number until you take another picture?
[+] [-] charcircuit|4 years ago|reply
[+] [-] adgjlsfhk1|4 years ago|reply
[+] [-] bee_rider|4 years ago|reply
[+] [-] acidburnNSA|4 years ago|reply
[1] https://partofthething.com/thoughts/making-true-random-numbe...
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] LeoPanthera|4 years ago|reply
Other than having to keep the secret app id actually secret.
[+] [-] dragontamer|4 years ago|reply
Massaging the bits into proper form: using modulus operator, requires like 80-cycles (aka: a division) on 5-year old CPUs and maybe 20-cycles on a very modern core (with the past 3 years, as Intel seems to have put a lot of work optimizing division). So the bulk of the time is on this massaging process actually.
--------
Even if the bulk of the CPU time is spent converting the uniformly random bits into usable integers (division / modulus is really slow!!), or usable floats / normal distributions or whatever... I still expect the typical core to handle 100-million pseudo random numbers per second... __per core__.
Seeding your PRNG with a true random seed every now and then will ensure large-scale randomness.
I think cryptographers might need something higher quality, but key-generation is quite uncommon, so you wouldn't need to be doing millions-and-millions of keygens per second, would you?
Simultaions need lots of random numbers, but those random numbers don't need to have entropy guarantees, but instead have weirder requirements (not only speed of generation, but also the ability to rerun your simulation... you actually want deterministic random-number-generators for simulations to verify your results). So in these cases, PRNGs are not only needed, but superior to a true RNG.
There's also the weird case of quasi-random, a term I learned from the graphics community. You want your rays to be random-ish, but actually "more uniform" than actually random. Its a more pleasant randomish pattern to look at, leading to pleasant and artistic images (https://en.wikipedia.org/wiki/Halton_sequence). Quasi-random raytracing is a so called "Biased" generator, because the physical light models are probably closer to true random. However, quasi-random sequences lead to far less noise in far less time / samples, so they're more useful in practice.
--------
Ultimately, our x86 CPUs today have "true" RNGs built into the RDSEED instruction, pulling entropy from some kind of physical circuit that's highly sensitive to temperature conditions (thermal quantum noise generators inside of circuits. After all, heat is quantumly random as your electrons jump between states in the very small scale).
[+] [-] hinkley|4 years ago|reply
It's not uncommon to instead treat the RNG data as a floating point number between 0 and 1 and multiply it by the modulus (adding or subtracting 1 as appropriate depending on whether you want, say, 0-20 or 1-20 as the output). So that's a float multiplication, an addition, and a conversion to integer. Still probably cheaper than 80 cycles.
But we're also burning 32bits of data to get 4.4 bits of output. When randomness was more expensive than CPU cycles they would throw a Linear Congruence Generator on top, which is a function can transform an input and achieve similar levels of entropy in the output. It might take 5 bits or 8 bits of data instead of 32, at least quadrupling the utility of your source. But that's mostly for historical purposes. As you say, the ratio of CPU to random decisions has shifted many orders of magnitude in the last 20 years.
[+] [-] bfuller|4 years ago|reply
I will say that random numbers as a service is a viable business opportunity. And I have heard that micro QRNG chips are being put into new phones.
I'm a bit of an rng nerd
[+] [-] charcircuit|4 years ago|reply
How do you compete with the free CSPRNGs provided by the operating system?
[+] [-] penisverse|4 years ago|reply
[+] [-] Bre3ker|4 years ago|reply