(no title)
anonova | 4 years ago
It's not the highest of quality hash functions (see the SMHasher benchmarks), but it is fast. A great alternative is XXH3 (https://cyan4973.github.io/xxHash/), which has seen far more usage in practice.
anonova | 4 years ago
It's not the highest of quality hash functions (see the SMHasher benchmarks), but it is fast. A great alternative is XXH3 (https://cyan4973.github.io/xxHash/), which has seen far more usage in practice.
thomasahle|4 years ago
On the other hand a hash like UMash guarantees low collisions on any input: https://engineering.backtrace.io/2020-08-24-umash-fast-enoug...
jdcarter|4 years ago
Aside: when storing hashes, be sure to store the hash type as well so that you can change it later if needed, e.g. "xxh3-[hash value]". RFC-6920 also has things to say about storing hash types and values, although I haven't seen its format in common use.
njt|4 years ago
Thanks for sharing this, I'd been doing this on my own for my own stuff (eg. foo.txt-xxh32-ea79e094), but it's good to know someone else has thought it through.
I ran into the problem once where someone had named some files foo-fb490c or something similar without any annotation, and when there was a problem, it took a file to figure out they were using truncated sha256 hashes.
DeathArrow|4 years ago
"we wanted a fast, non-cryptographic hash for use in change detection and deduplication"
>A great alternative is XXH3
Meow Hash is twice as fast.
MauranKilom|4 years ago
The readme has since been updated. I didn't check whether any algorithmic changes were made on top, but the discussion of the analysis on github didn't point to a lot of low-hanging fruit.
LeoPanthera|4 years ago
IncRnd|4 years ago
The author of the article's page claims the hash is not cryptographic but actually goes on to make security claims about the hash. People who do not understand cryptography should be careful about making such claims. The author appear to understand this more than your comment demonstrates.
For example, a claim about change detection is a cryptographic claim of detecting preimage attacks. In a threat model, a security professional would determine whether a first preimage or a second preimage attack is what should be guarded in attack scenarios. Then, the professional would help with analysis, determining mitigations, defense in depth, and prioritization of fixing the vulnerabilities exposed by how the hash is used.
A hash cannot be considered standalone. It is the architecture and use-case where the hash's security properties are used to determine what security properties of the application are fulfilled.
So, if the author is correct, which seems to be the case, then meowhash should not be used in a production environment outside of the most simplistic checks. It seems faster for its intended use case to simply check for a single bit difference between two images - no hash required.
jrochkind1|4 years ago
ncann|4 years ago
iratewizard|4 years ago
ilitirit|4 years ago