WingNews

cperciva|4 years ago

Better question, why not use whatever hash is fastest on the system in question? We're talking about a RNG here -- it's not like we need to make sure that two different systems produce the same random numbers!

str4d|4 years ago

My answer to that is "maintainability". The more flexibility and moving parts we add to the RNG, the more things that can break. The delta from "one hash function" to "two hash functions" would involve not just adding a second hash function, but also a bunch of configuration code, target-specific logic, fallback handling, etc. There are plenty of places in the kernel that need to care about this kind of logic, but I don't believe the RNG is one of them, and I don't particularly want to require the RNG maintainers to spend their time caring about it.

Additionally, it's not just the hash function that would matter for speed here, but also the expansion. Linux RNG uses ChaCha20 for that, so if you were going all-in on target-specific speed, you'd need additional logic for swapping that out for a hardware-accelerated cipher (probably AES, which would introduce even more considerations given that it has a 16-byte block size, vs ChaCha20's 64-byte blocks).

espadrine|4 years ago

The title mentions performance, but it is not the primary motivation AFAICT. It is only mentioned to say “it is not slower”.

The main concern was security, so it makes sense to use BLAKE2, which benefits from existing cryptanalysis of the ChaCha20 permutation, which is already used in the RNG for number generation.

(And it makes sense to use BLAKE2s in particular, to support non-64-bit systems without penalty.)

Using a single hash (instead of picking one at runtime) simplifies the attack surface IMO.

nerdponx|4 years ago

Maybe this is a silly question, but why should RNG even be part of the kernel in the first place? It's convenient having it in a device file, but why couldn't that be provided by some userspace program or daemon?

Mehdi2277|4 years ago

I normally care about reproducible rng results and do try to seed all rngs used. There are lots of applications where randomness is used but you still want repeatable behavior. ML experiments is one situation where randomness is common, but is also intended to be repeatable.

I think python standard library/several of data science library rngs are platform independent.

wahern|4 years ago

AFAICT, BLAKE2s (previously SHA-1) is only being used for the forward secrecy element, in this case mixing a hash of the pool back into the core state, which is actually still using ChaCha20 for expansion. From quick inspection (never read this code before) I think the number of bytes in the pool is 416 (104 * 4). (See poolwords defined at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...) For such relatively small messages and based on the cycles-per-byte performance numbers from https://w3.lasca.ic.unicamp.br/media/publications/p9-faz-her... (SHA-NI benchmarks), https://www.blake2.net/blake2_20130129.pdf (BLAKE2 paper), and https://bench.cr.yp.to/results-hash.html (comprehensive table of measurements), I don't see any performance reasons for choosing BLAKE2s over SHA-256. Rather, software SHA-256 and BLAKE2s seem comparable (and that's being charitable to BLAKE2s), and SHA-NI is definitely faster.

Perhaps there were other considerations at play. Maybe something as simple as the author's preference. One thing that probably wasn't a consideration is FIPS compliance--the core function is ChaCha20 so FIPS kernels require a completely different CSPRNG, anyhow.

maqp|4 years ago

https://github.com/BLAKE3-team/BLAKE3/blob/master/media/spee... argues BLAKE2s is twice as fast compared to SHA256.

One aspect switching from SHA1 to BLAKE2s does is it increases the total entropy a single compression operation adds to ChaCha20. This increases speed when folded BLAKE2s adds 128 bits per operation instead of folded SHA-1 that adds 80 bits. So that's two calls instead of four (I'm assuming they kept the folding). Another speedup comes from the fact the hash function constants aren't being filled with RDRAND inputs for every call.

Finally, I'm not completely sure if increasing the hash size itself adds computational security against an attack where the internal state is compromised once, and the attacker tries to brute force the new state based on new output; My conjecture is the reseeding operation is atomic, i.e. that ChaCha20 won't yield anything until the reseed is complete. There shouldn't thus be any difference in this regard. I'd appreciate clarification wrt this.

normaler|4 years ago

Because Hardware support is not available in alot oft devices and then blake2 beats sha2.

willis936|4 years ago

I wonder if there are more virtual machines than bare metal hosts. There must be, right?

Even with paravirtualization as an option I feel like many users of VMs would opt for the advantages of full virtualization.

pedrocr|4 years ago

Speed was probably not the criteria otherwise blake3 seems like an even better choice:

https://github.com/BLAKE3-team/BLAKE3

tptacek|4 years ago

I think speed is definitely a big part of this, though the speedup comes primarily from getting rid of superfluous calls to RDRAND. Blake2s is already in the kernel (after a fashion) for WireGuard itself; I don't think Blake3 is. An additional nerdy point here is that the extraction phase of the LKRNG is already using ChaCha (there's a sense in which a CSPRNG is really just the keystream of a stream cipher), and Blake2s and ChaCha are closely related.

Kubuxu|4 years ago

Blake3 will only result in small incremental perf improvement due to reduced number of rounds (10 to 7).

Majority of Blake3's perf benefit manifests from merkle-tree structure and SIMD processing of multiple input streams at the same time.

(no title)

discuss