UUID v5 is quite useful if you want to deterministically convert external identifiers into UUIDS — define a namespace UUID for each potential identifier source (to keep them separate), then use that to derive a V5 UUID from the external identifier. It's very useful for idempotent data imports.
jandrewrogers|1 year ago
kbolino|1 year ago
I doubt that the quality of the hash function is the real issue. The problem with MD5 and SHA1 is that it's easy (for MD5) and technically possible (for SHA1) to generate collisions. That makes them broken for enforcing message integrity. But a UUID is not an integrity check. Both MD5 and SHA1 are still very good as non-cryptographic hash functions. While a hash-based UUID provides obfuscation, it isn't really a security mechanism.
Even the existence of UUIDv5 feels like a knee-jerk reaction from when MD5 was "bad" but SHA1 was still "good". No hash function will protect you against de-obfuscation of low-entropy inputs. I can feed your social security number through SHA3-512 but it's not going to make it any less guessable than if I fed it through MD5.
Moreover, a UUID only has 122 bits of usable space. Even if we defined a new SHA2- or SHA3-based UUID version, it's still going to have to truncate the hash output to less than half of its full size. This significantly alters the security properties of the hash function, though I'm not sure if much cryptanalysis has been done on the shorter forms to see if they're more practically breakable yet.
There is one area where the collision resistance of the hash function could be a concern, though. If all of the inputs to the hash are under the control of a potential attacker, then maliciously constructed data could produce the same UUID. I still wouldn't think this would be a major issue, since most databases will fail to insert a duplicate key, but it might allow for various denial of service attacks. This still feels like quite a niche risk, though, and very circumstance-dependent.