top | item 42410835

(no title)

dan_manges | 1 year ago

Checksums are a good use case, where you’re not worried about any sort of forged collisions and you want to calculate the checksum as fast as possible. Although cryptographic hashes are fast enough that there likely aren’t too many use cases where the difference in speed is meaningful.

discuss

order

loeg|1 year ago

If you don't care about forged collisions, what makes a secure-ish checksum bette than a faster insecure checksum? Both are likely to result in a different value for bit-flipped input, which is the point.

cstrahan|1 year ago

Note that you said “likely” and not “equally likely”. Therein lies your answer.

There are applications where hashes are used as identifiers, where it would be impossible to use the original data to resolve possible collisions. One example would be in RTTI (Run Time Type Information), when you want to check if two objects (of unknown-at-compile-time type) are instances of the same type. The only performant way to achieve this is if the compiler emits each type’s RTTI with a unique integer key to efficiently compare against, and this key must be unique across all object files and shared objects. In practice, this is done using hashing. If there is a collision, then the program behavior is undefined, so it is ideal to minimize probability of collision. You can see this issue in the Rust compiler project where there is ongoing discussion about how to handle this hashing, trying to balance compiler speed and collision likeliness:

https://github.com/rust-lang/rust/issues/129014

Another scenario would be like what Backtrace does: a hash is used to identify certain classes of errors in application code. The cost of analyzing each error is quite high (you have to load objects into memory, source maps (if available), perform symbolification, etc), so comparing any two errors every time you want to perform any metrics or retrieve associated data (like issue tracker identifiers) would reduce the performance to a point that any such application would be too slow to be useful. So a lot of thought was given to producing fast, non-cryptographic hashes (these hashes aren’t shared across tenants, and no customer is going to go out of their way to work out initialization parameters so they can be their own adversary) with a known likely hood of collision. You can read about the hash (UMASH) here:

https://engineering.backtrace.io/2020-08-24-umash-fast-enoug...

nepthar|1 year ago

That’s a good question. A cryptographic hash must not leak any information other than “yes” or “no”. A checksum-type hash might have an additional property where data that only has a single bit flipped might hash to something close to the original hash, allowing you to refine your collision attack by knowing if you’re hot or cold.

For a checksum, this isnt a bad thing and actually could be good.