top | item 15917397

Getting randomness from an Apple device with particle physics, thermal entropy

62 points| lordmax | 8 years ago |vault12.com | reply

32 comments

order
[+] cozzyd|8 years ago|reply
I'm not familiar with how the video is encoded, but maybe you have to be careful to make sure that the encoding doesn't introduce correlations between adjacent frames (you can imagine how P or B frames might have correlated noise, for example).

Also, I don't think you'll ever see Cosmic Ray interactions last more than one frame... the time scales for cosmic ray interactions are in nanoseconds, not ms. I suppose electrons could get trapped somewhere and diffuse slowly (that does happen in some CCD's, I don't know enough about CMOS sensors to know if that can happen), but that would happen with bright images in addition to cosmic rays.

In such a small sensor, cosmic rays should be relatively rare. To first order, you can model the number of cosmic rays you see as a Poisson process, which should be a pretty good source of entropy. You can even estimate the rate based on altitude / sensor orientation, etc. However, nearby cosmic ray detectors will detect correlated cosmic rays (from extensive air showers from the same primaries), which might hurt.

[+] lordmax|8 years ago|reply
The key property for crypto randomness here is that these high energy particle events (be that cosmic rays, background radiation, etc) are not just random, but independent from thermal noise. They are few and far between but they affect each sample somewhere. One way or another all that entropy will get hashed, and having even few bits that are contributed by independent phenomena makes final hash extremely hard to attack.

Considering all sources that contribute noise to sensors (thermal, light photons count, high energy particles, shot/RTS noise, and i'm probably missing a few), all with unique distributions and characteristics makes each sample readout very hard to predict.

[+] AceJohnny2|8 years ago|reply
Tangentially, I wonder if most modern smartphone chips include a hardware random number generator, and if that is exposed to userspace?

The iPhone has hardware random number generator, at least: "The Secure Enclave is a coprocessor fabricated in the Apple S2, Apple A7, and later A-series processors. It uses encrypted memory and includes a hardware random number generator."

https://www.apple.com/business/docs/iOS_Security_Guide.pdf

I couldn't immediately find if that functionality was exposed in an API.

[+] lordmax|8 years ago|reply
I checked on that a while back as well. As far as i can find out SE HRNG is not exposed to user at all. It used internally in quite complicated process of secure booting and unlocking iOS device (there is an interesting presentation floating around with all details of reverse engineering of that process, and amount of security designed by Apple into their own hardware-to-hardware protocols is on very respectable level of insane). I think its likely SE HRNG is included in seeding /dev/urandom on iOS, so it is one of the most secure CSPRNGs around.
[+] lordmax|8 years ago|reply
worth mentioning (that sort of main premise of the article that gets a little bit unnoticed in all the methodology discussion): all existing HWRNG are relatively low bandwidth - because they are bound by physical process, rather then endless spinning up of /dev/urandom. They all have to wait for physics to produce each bit, and existing chips don't have that much "physics" in them.

The main novelty factor of "camera noise HRNG" is that we effectively leveraging 12M micro HRNGs in parallel - thats where that firehose of entropy is coming from.

[+] mceachen|8 years ago|reply
Really approachable writing. Fun read!

For people that want to read about other entropy sources: http://www.fourmilab.ch/hotbits/how3.html

[+] lordmax|8 years ago|reply
Matthew, there is my personal collection of entropy sources for your reference. That stuff is addictive!

- True quantum source: http://whitewoodsecurity.com/products/entropy-engine/

- Atmospheric noise: https://www.random.org/randomness/

- Avalanche Effect Generators http://holdenc.altervista.org/avalanche/ http://ubld.it/truerng_v3

- Great deck about overall entropy engineering https://www.blackhat.com/docs/us-15/materials/us-15-Potter-U...

And of course "lave wall" we mentioned just takes the cake: https://sploid.gizmodo.com/one-of-the-secrets-guarding-the-s...

[+] saagarjha|8 years ago|reply
Interesting fact: on Darwin, /dev/random and /dev/urandom behave identically; both are nonblocking: https://developer.apple.com/legacy/library/documentation/Dar...
[+] gerdesj|8 years ago|reply
RLY? How on earth can a read from /dev/random be non blocking if it does not have sufficient entropy?

My understanding is that /dev/random "promises" to return decent random stuff, /dev/urandom returns decently algorithmically generated random stuff. urandom is good, random is better but finicketty and can sulk for a while.

[+] CPAhem|8 years ago|reply
There can be a few pitfalls here. Assuming that the dark field image of the Apple camera is actually random noise and not some property of the sensor that can vary between images.

Getting the distribution of the randomness can be hard.

[+] lordmax|8 years ago|reply
you are quite correct about this - "lens closed" mode has least amount of entropy by our measure. However its clearly proportional to intensity of light hitting the sensor (and beside natural photon variance of light itself, the shot noise in sensor increases with more light). So in what we call "optimal" conditions - enough light to generate noise, not enough to oversaturated there is plenty of natural entropy coming from the sensor.
[+] rudolfwinestock|8 years ago|reply
This reminds me of SID, the custom sound chip on the Commodore 64. It was a hybrid digital/analog chip, so the sound which it produced wasn't consistent between chips due to thermal entropy effects. El33t h4x0rz took advantage of this by making the SID output a bit of white noise into a CPU register rather than the speakers in order to get true random numbers.
[+] mrkoot|8 years ago|reply
Alexandre Anzala-Yamajako posted interesting comments on this to [Cryptography] (@metzdowd.com):

> IMO a statistical approach based on taking a bunch of data a saying essentially "I don t see any signs that it s not random" is not a good approach for entropy seeding. The example is old but I could give you the output of an AES in counter mode with a null key and a null iv and no standard statistical test woud ever show you any defects while you have absolutely no entropy.

> You case is particularely worrisome for several reasons 1) you use a von neuman like extractor but you have also shown that your data is not only biased but also correlated 2) you don t seem have a model of your hardware source from which you could derive the output distribution 3) you do some wizardry to remove some correlation but nowhere show or prove that there isn t more corrolation to be taken care of or how 4) I didn t see in your document a justification of the fact that the manufacturer of the camera (soft and hardware) doesn t have more information than you and could therefore target defects in your entropy management procedure.

> You should have a look at the work of Viktor Fischer, David Lubicz, Florent Bernard and patrick Haddad. They invested quite a bit of effort to give entropy guarantees when using very specific hardware device.

Skibinsky subsequently responded:

> Alexandre, thanks for reading and suggestions! I will certainly check out your references.

> As it is probably obvious from the essay-style narrative, this is not intended to be a tight scientific paper, just our research log of first order ideas we coded up for minimal working prototype. You are correct on #1,#3 - current codebase doesn't addresses these issues. #2 is interesting, because besides wide variety of camera hardware that model should reflect, iOS camera parameters present us with an opportunity to create optimal hardware source. This is far from our area of expertise, so I hope somebody in open source community will pick it up from here and figure out both formal model and what physical settings will optimize the source.

> Thanks again for great suggestions, I will further emphasize impact of correlations & VN sensitivity to non-IID in final section.

> Most likely practical direction of course is simply use universal hash extractor instead of VN, since it relaxes a lot of requirements.

[+] atoponce|8 years ago|reply
This is really cool. The fact that I can, basically, carry around a TRNG in my pocket is like, the ultimate nerd. Other than reseeding /dev/urandom, I don't really have any personal need for it, but the discussions that can be generated from it could be very interesting.
[+] lordmax|8 years ago|reply
my motivation from a while back was it would be so useful to have something that you can dump few megs of independent entropy into your /dev/urandom right before generating BTC/ssh keys. TrueRNG stick is also good, but just 40kb/sec (i have TrueRNG on cron job reseeding every hour anyway)
[+] iridium|8 years ago|reply
Wouldn't a lossy JPEG (or HEIC) compression reduce the entropy?
[+] lordmax|8 years ago|reply
That would be correct! However we getting values from raw camera video buffer before any compression takes place.
[+] et2o|8 years ago|reply
Couldn’t you have used more formal methods for spatial autocorrelation?
[+] lordmax|8 years ago|reply
Do you have any references of existing code or research that deals with correlation issues? We considered few home grown ideas (like measuring correlation level in each sample and then compacting a sample by that % before quantizing) but all of them were pretty computationally heavy...
[+] zkms|8 years ago|reply
> Take two video frames as big arrays of RGB values.

> Subtract one frame from another, that leaves us with samples of raw thermal noise values.

> Calculate the mean of a noise frame. If it is outside of ±0.1 range, we assume the camera has moved between frames, and reject this noise frame.

> Delete improbably long sequences of zeroes produced by oversaturated areas. For our 1920x1080=2Mb samples and a natural zero probability of 8.69%, any sequence longer than 7 zeros will be removed from the raw data.

> Quantize raw values from ±40 range into 1,2 or 3 bits: raw_value % 2^bits.

> Group quantized values into batches sampled from different R,G,B channels, at big pixel distances from each other and in different frames to minimize the impact of space and time correlations in that batch.

> Process a batch of 6–8 values with the Von Neumann algorithm to generate a few uniform bits of output.

> Collect the uniform bits into a new entropy block.

> Check the new block with a chi-square test. Reject blocks that score too high and therefore are too improbable to come from a uniform entropy source.

This reads like a highly ad-hoc process with nothing resembling a formal justification for any of its steps; nor its general outline, nor any of the magic numbers used in it. It's unclear what properties are being achieved and how exactly the steps guarantee those. There is no analysis of the predictability of the data by an adversary, either.

What does this get you that SHA512'ing the entire raw image bitmap doesn't? Using statistical tests makes sense to verify that the camera data isn't pathologically anomalous (say, all zeroes or all 255), but I don't understand why this sort of procedure is preferable to using a strong hash function to extract randomness from an image sensor's output.

[+] lordmax|8 years ago|reply
you are completely correct - as I mentioned in the end this would be easier extractor: "Want to relax IID assumptions and avoid careful setup of a non-correlated quantizer? Easy — use any good universal hash function. The only new requirement is that the universal hash will require a one-time seed. We can make an easy assumption that the iPhone local state of /dev/urandom is totally independent from thermal noise camera is seeing, and simply pick that seed from everybody's favorite CSPRNG."

The main reason we went with VN instead of SHA or universal hash is just was more fun thing to build/experiment with. SHA is like flamethrower that will deal with anything you throw at it. VN is far more brittle and you see all mistakes in scenery or generation. Of course then you hash the output anyway before use!