(no title)
zigzag312 | 1 month ago
Readme in SMHasher test suite also seems to indicate that 32 bits might be too few for file checksums:
"Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit."
fc417fc802|1 month ago
For detecting file corruption the amount of data alone isn't the issue. Rather what matters is the rate at which corruption events occur. If I have 20 TiB of data and experience corruption at a rate of only 1 event per TiB per year (for simplicity assume each event occurs in a separate file) that's only 20 events per year. I don't know about you but I'm not worried about the false negative rate on that at 32 bits. And from personal experience that hypothetical is a gross overestimation of real world corruption rates.
zigzag312|1 month ago