top | item 46705845

(no title)

zigzag312 | 1 month ago

From SMHasher test results quality of xxhash seems higher. It has less bias / higher uniformity that CRC.

What bothers me with probability calculations, is that they always assume perfect uniformity. I've never seen any estimates how bias affects collision probability and how to modify the probability formula to account for non-perfect uniformity of a hash function.

discuss

order

jmillikin|1 month ago

It doesn't matter, though. xxhash is better than crc32 for hashing keys in a hash table, but both of them are inappropriate for file checksums -- especially as part of a data archival/durability strategy.

It's not obvious to me that per-page checksums in an archive format for comic books are useful at all, but if you really wanted them for some reason then crc32 (fast, common, should detect bad RAM or a decoder bug) or sha256 (slower, common, should detect any change to the bitstream) seem like reasonable choices and xxhash/xxh3 seems like LARPing.

wyldfire|1 month ago

> both of them are inappropriate for file checksums

CRCs like CRC32 were born for this kind of work. CRCs detect corruption when transmitting/storing data. What do you mean when you say that it's inappropriate for file checksums? It's ideal for file checksums.

minitech|1 month ago

Uniformity isn’t directly important for error detection. CRC-32 has the nice property that it’s guaranteed to detect all burst errors up to 32 bits in size, while hashes do that with probability at best 2^−b of course. (But it’s valid to care about detecting larger errors with higher probability, yes.)

zigzag312|1 month ago

> Uniformity isn’t directly important for error detection.

Is there any proof of this? I'm interested in reading more about it.

> detect all burst errors up to 32 bits in size

What if errors are not consecutive bits?